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NOTES FOR CMOS DEVICES 


@ PRECAUTION AGAINST ESD FOR SEMICONDUCTORS 

Note: 

Strong electric field, when exposed to a MOS device, can cause destruction of the gate oxide and 
ultimately degrade the device operation. Steps must be taken to stop generation of static electricity 
as much as possible, and quickly dissipate it once, when it has occurred. Environmental control 
must be adequate. When it is dry, humidifier should be used. It is recommended to avoid using 
insulators that easily build static electricity. Semiconductor devices must be stored and transported 
in an anti-static container, static shielding bag or conductive material. All test and measurement 
tools including work bench and floor should be grounded. The operator should be grounded using 
wrist strap. Semiconductor devices must not be touched with bare hands. Similar precautions need 
to be taken for PW boards with semiconductor devices on it. 


HANDLING OF UNUSED INPUT PINS FOR CMOS 

Note: 

No connection for CMOS device inputs can be cause of malfunction. If no connection is provided 
to the input pins, it is possible that an internal input level may be generated due to noise, etc., hence 
causing malfunction. CMOS devices behave differently than Bipolar or NMOS devices. Input levels 
of CMOS devices must be fixed high or low by using a pull-up or pull-down circuitry. Each unused 
pin should be connected to Vop or GND with a resistor, if it is considered to have a possibility of 
being an output pin. All handling related to the unused pins must be judged device by device and 
related specifications governing the devices. 


STATUS BEFORE INITIALIZATION OF MOS DEVICES 

Note: 

Power-on does not necessarily define initial status of MOS device. Production process of MOS 
does not define the initial operation status of the device. Immediately after the power source is 


turned ON, the devices with reset function have not yet been initialized. Hence, power-on does 


not guarantee out-pin levels, I/O settings or contents of registers. Device is not initialized until the 
reset signal is received. Reset operation must be executed immediately after power-on for devices 
having reset function. 


Vr3000, VR4200, VR4300, VR4400,VR5000,VR10000,VR12000,and,Vr-Series are trademarks of NEC Corporation. 
RISCompiler, RISC/os, R2000,R3000,R4000,and R6000 are trademarks of MIPS Computer Systems Inc. 
MIPS,R4200,R4300,R4400,R8000, and R10000 are trademarks of MIPS Technologies, Inc. 

UNIX is a registered trademark in the United States and other countries, licensed exclusively through 
X/Open Company, Ltd. 


The export of this product from Japan is prohibited without governmental license. To export or re-export this product from 
a country other than Japan may also be prohibited without a license from that country. Please call an NEC sales 
representative. 


Exporting this product or equipment that includes this product may require a governmental license from the 


U.S.A. for some countries because this product utilizes technologies limited by the export control regulations 
of the U.S.A. 


The information in this document is subject to change without notice. 

No part of this document may be copied or reproduced in any form or by any means without the prior written 
consent of NEC Corporation. NEC Corporation assumes no responsibility for any errors which may appear in 
this document. 

NEC Corporation does not assume any liability for infringement of patents, copyrights or other intellectual 
property rights of third parties by or arising from use of a device described herein or any other liability arising 
from use of such device. No license, either express, implied or otherwise, is granted under any patents, 
copyrights or other intellectual property rights of NEC Corporation or others. 

While NEC Corporation has been making continuous effort to enhance the reliability of its semiconductor devices, 
the possibility of defects cannot be eliminated entirely. To minimize risks of damage or injury to persons or 
property arising from a defect in an NEC semiconductor device, customers must incorporate sufficient safety 
measures in its design, such as redundancy, fire-containment, and anti-failure features. 

NEC devices are classified into the following three quality grades: 

"Standard", "Special", and "Specific". The Specific quality grade applies only to devices developed based on 
a customer designated "quality assurance program" for a specific application. The recommended applications 
of a device depend on its quality grade, as indicated below. Customers must check the quality grade of each 
device before using it in a particular application. 

Standard: Computers, office equipment, communications equipment, test and measurement equipment, 
audio and visual equipment, home electronic appliances, machine tools, personal electronic 
equipment and industrial robots 

Special: Transportation equipment (automobiles, trains, ships, etc.), traffic control systems, anti-disaster 
systems, anti-crime systems, safety equipment and medical equipment (not specifically designed 
for life support) 

Specific: Aircrafts, aerospace equipment, submersible repeaters, nuclear reactor control systems, life 
support systems or medical equipment for life support, etc. 

The quality grade of NEC devices is "Standard" unless otherwise specified in NEC's Data Sheets or Data Books. 
If customers intend to use NEC devices for applications other than those specified for Standard quality grade, 
they should contact an NEC sales representative in advance. 

Anti-radioactive design is not implemented in this product. 
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Regionallinformation 


Some information contained in this document may vary from country to country. Before using any NEC 
product in your application, please contact the NEC office in your country to obtain a list of authorized 
representatives and distributors. They will verify: 


* Device availability 
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¢ Product release schedule 


* Availability of related technical literature 


* Development environment specifications (for example, specifications for third-party tools and 
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In addition, trademarks, registered trademarks, export restrictions, and other legal issues may also vary 
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R10000 Microprocessor? 
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Legend 


Related Documents 


PREFACE 


This manual targets users who intends to understand the functions of the Vr10000 and Vr12000, 
and to design application systems using this microprocessor. 


This manual introduces the architecture and hardware functions of the Vr10000, Vr12000 to users, 
following the organization described below. 


This manual consists of the following contents: 
¢ Introduction 

* Cache 

¢ Hardware 

¢ Coprocessor 0 

¢ Floating point unit 

¢ Memory management system 

¢ Exception processing 

¢ Instruction set details 


It is assumed that the reader of this manual has general knowledge in the fields of electric 
engineering, logic circuits, and microcomputers. 


The R3000™ in this manual represents the Vr3000™. 
The R4200™ in this manual represents the Vr4200™. 
The R4300™ in this manual represents the Vr4300™. 
The R4400™ in this manual represents the Vr4400™. 
The R10000™ in this manual represents the Vr10000™. 
The R12000™ in this manual represents the Vr12000™. 
To learn about detailed function of a specific instruction. 


— Read Chapter 14 Floating-Point Unit, Chapter 16 CPU Exceptions, or refer 
to Vr5000, Vr10000 User’s Manual INSTRUCTION which is separately available. 


To learn about the overall functions of the Vr10000 and Vr12000 
— Read this manual in sequential order. 


To learn about electrical specifications, 
— Refer to Data Sheet which is separately available. 


Unless otherwise specified, the R10000 is treated as the representative model throughout 
this document. 


Data significance: Higher on left and lower on right 
Active low: XXX* 
Numeric representation: binary ... XXXX or KXXX) 
decimal ... XXXX 
hexadecimal ... OxXXXX 
Important information Underlined 


The related documents indicated here may include preliminary version. However, preliminary 
versions are not marked as such. 


¢ Data sheet 
Vr10000, Vrl2000 Data Sheet To be issued 


¢ User’s Manual 
Vr5000, Vrl0000 User’s Manual INSTRUCTION U12754E 
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Introduction to the RIOOOO Processor 


This user’s manual describes the R10000 superscalar microprocessor for the system 


designer, paying special attention to the external interface and the transfer protocols. 


This chapter describes the following: 
e  MIPS™ ISA 
e« what makes a generic superscalar microprocessor 
¢ — specifics of the R10000 superscalar microprocessor 


¢ implementation-specific CPU instructions 
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Chapter 1 Introduction to the R10000 Processor 


1.1 MIPS Instruction Set Architecture (ISA) 
MIPS has defined an instruction set architecture (ISA), implemented in the following sets 
of CPU designs: 
¢ MIPS I, implemented in the R2000™ and R3000 
¢ MIPS II, implemented in the R6000™ 
¢ MIPS III, implemented in the R4400 
¢ MIPS IV, implemented in the R8000™ and R10000 


The original MIPS I CPU ISA has been extended forward three times, as shown in Figure 
1-1; each extension is backward compatible. The ISA extensions are inclusive; each new 
architecture level (or version) includes the former levels." 


MIPS II 


MIPS III 


MIPS IV 


Figure 1-1 MIPS ISA with Extensions 


The practical result is that a processor implementing MIPS IV is also able to run MIPS I, 
MIPS II, or MIPS II binary programs without change. 


+ For more ISA information, please refer to the MIPS IV Instruction Set Architecture, published 
by MIPS Technologies, and written by Charles Price. Contact information is provided both 
in the Preface, and inside the front cover, of this manual. 
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1.2 What is a Superscalar Processor? 


A superscalar processor is one that can fetch, execute and complete more than one 
instruction in parallel. 


Pipeline and Superpipeline Architecture 


Instruction 4 


Instruction 3 


Superscalar Architecture 


Instruction 1 


Instruction 2 


Instruction 3 


Instruction 4 


Previous MIPS processors had linear pipeline architectures; an example of such a linear 
pipeline is the R4400 superpipeline, shown in Figure 1-2. In the R4400 superpipeline 
architecture, an instruction is executed each cycle of the pipeline clock (PCycle), or each 


pipe stage. 
a Bate a 1 PCycle 
IF IS | RF | EX | DF |} DS | TC | WB 
IF IS | RF | EX | DF; DS | TC} WB 
Instruction 2 IF IS | RF | EX] DF} DS| TC} WB 
Instruction 1 | IF IS | RF | EX | DF| DS} TC WB} 


Figure 1-2. R4400 Pipeline 


The structure of 4-way superscalar pipeline is shown in Figure 1-3. At each stage, four 
instructions are handled in parallel. Note that there is only one EX stage for integers. 


IF = instruction fetch 
ID = instruction decode and dependency 
IS = instruction issue 


EX = execution (1 only) 


IF ID Is EX WB 
——SS—SEEE ESE SSE ELLE ae 
IF ID IS EX WB 
ee 
IF ID IS EX WB 
ee 
IF ID IS EX WB WB = write back 
Instruction 5 IF ID Is EX WB 
Instruction 6 IF ID Is EX WB 
Instruction 7 IF ID Is EX WB 
Instruction 8 IF ID Is EX WB 
EE 


Figure 1-3 4-Way Superscalar Pipeline 
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1.3 What is an R10000 Microprocessor? 


The R10000 processor is a single-chip superscalar RISC microprocessor that is a follow- 
on to the MIPS RISC processor family that includes, chronologically, the R2000, R3000, 
R6000, R4400, and R8000. 


The R10000 processor uses the MIPS ANDES architecture, or Architecture with Non- 
sequential Dynamic Execution Scheduling. 


The R10000 processor has the following major features (terms in bold are defined in the 
Glossary): 


e it implements the 64-bit MIPS IV instruction set architecture (ISA) 


e it can decode four instructions each pipeline cycle, appending them to one of 
three instruction queues 


e it has five execution pipelines connected to separate internal integer and 
floating-point execution (or functional) units 


* it uses dynamic instruction scheduling and out-of-order execution 
* it uses speculative instruction issue (also termed “speculative branching”) 


* it uses a precise exception model (exceptions can be traced back to the 
instruction that caused them) 


* it uses non-blocking caches 

* it has separate on-chip 32-Kbyte primary instruction and data caches 

¢ it has individually-optimized secondary cache and System interface ports 
e it has an internal controller for the external secondary cache 


¢ it has an internal System interface controller with multiprocessor support 


Errata 
The R10000 processor is implemented using 0.35-micron CMOS VLSI circuitry on a single 


17 mm-by-18 mm chip that contains about 6.7 million transistors, including about 4.4 
million transistors in its primary caches. 
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R10000 Superscalar Pipeline 
The R10000 superscalar processor fetches and decodes four instructions in parallel each 
cycle (or pipeline stage). Each pipeline includes stages for fetching (stage | in Figure 1-4), 
decoding (stage 2) issuing instructions (stage 3), reading register operands (stage 3), 
executing instructions (stages 4 through 6), and storing results (stage 7). 


7 Pipeline Stages 


Stage 1 Stage 2 Stage 3 Stage 4 Stage 5 Stage 6 Stage 7 
Fetch Decode Issue Execute Execute Execute Store 


(FP Add Pipeline FAdd - 1 FAdd - 2 FAdd - 3 
(PROnaue) Floating-Point Queue 
FP Multiply Pipeline FMpy -1 FMpy-2 | FMpy-3 [Result ana: Realeters 
(FP Queue) 
5 Saerr 
Execution < Integer ALU Pipeline] | iccue | RE ALU1 Result 
ee (Integer Queue) 
Pipelines 
ALU2 Result > Integer Register Operands 
RF Addr.Calc: 


Integer ALU Pipeline 
(Integer Queue) 


Load/Store Pipeline 
(Address Queue) 


n 
n 
Cc 
oO 


Data Cache | Result 
TB, Of 


2-way Interleaved Cache 


Read operands from Floating-Point Translation-Lookaside Buffer 


Instruction Fetch Pipeline 
P or Integer Register Files 


Primary Decode 
Instruction 


Cale Branch Unit | Branch Address (one branch can be handled each cycle) 


4 Instruction/Cycle Fetch and Decode Functional Units (Execute Instruction) 


Figure 1-4 Superscalar Pipeline Architecture in the RI10000 
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Instruction Queues 


As shown in Figure 1-4, each instruction decoded in stage 2 is appended to one of three 
instruction queues: 


* integer queue 
e address queue 


¢ floating-point queue 


Execution Pipelines 
The three instruction queues can issue (see the Glossary for a definition of issue) one new 
instruction per cycle to each of the five execution pipelines: 
¢ the integer queue issues instructions to the two integer ALU pipelines 
¢ the address queue issues one instruction to the Load/Store Unit pipeline 
e the floating-point queue issues instructions to the floating-point adder and 
multiplier pipelines 


A sixth pipeline, the fetch pipeline, reads and decodes instructions from the instruction 
cache. 


Load/store dependency is speculatively ignored (R12000) 


When a load follows a store in program-order, and the address of the load is known to the 
Address Queue (AQ) before the address of the store, then the AQ may speculatively issue 
the load to tag-check and data access. When the address of the store is determined, the AQ 
can undo the effects of the load through the use of the “soft-exception” mechanism. Since 
almost all loads which are actually dependent on previous stores use the same registers to 
form their addresses, normally either the two instructions are independent, or their 

addresses are resolved in program order, so the soft-exception should occur rarely. 


64-bit Integer ALU Pipeline 


The 64-bit integer pipeline has the following characteristics: 
¢ it has a 16-entry integer instruction queue that dynamically issues instructions 


¢ it has a 64-bit 64-location integer physical register file, with seven read and 
three write ports (32 logical registers; see register renaming in the Glossary) 


* it has two 64-bit arithmetic logic units: 


- ALUI contains an arithmetic-logic unit, shifter, and integer branch 
comparator 


- ALU2 contains an arithmetic-logic unit, integer multiplier, and divider 
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Load/Store Pipeline 


The load/store pipeline has the following characteristics: 


64-bit Floating-Point Pipeline 


it has a 16-entry address queue that dynamically issues instructions, and uses 
the integer register file for base and index registers 


it has a 16-entry address stack for use by non-blocking loads and stores 
it has a 44-bit virtual address calculation unit 


it has a 64-entry fully associative Translation-Lookaside Buffer (TLB), 
which converts virtual addresses to physical addresses, using a 40-bit physical 
address. Each entry maps two pages, with sizes ranging from 4 Kbytes to 16 
Mbytes, in powers of 4. 


The 64-bit floating-point pipeline has the following characteristics: 


it has a 16-entry instruction queue, with dynamic issue 

it has a 64-bit 64-location floating-point physical register file, with five read 
and three write ports (32 logical registers) 

it has a 64-bit parallel multiply unit (3-cycle pipeline with 2-cycle latency) 
which also performs move instructions 


it has a 64-bit add unit (3-cycle pipeline with 2-cycle latency) which handles 
addition, subtraction, and miscellaneous floating-point operations 


it has separate 64-bit divide and square-root units which can operate 
concurrently (these units share their issue and completion logic with the 
floating-point multiplier) 


A block diagram of the processor and its interfaces is shown in Figure 1-5, followed by a 
description of its major logical blocks. 
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Edge of Known World 


External Agent 
or Cluster Coordinator 
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128-bit refill 


32 Kbytes 


16-word blocks 
Unaligned access 


Up to 4 R10000 Microprocessors may be directly connected. Secondary Cache 


System Interface 


Secondary Cache Ctlr 


Instruction Cache 


2-way Set Associative 


Addr Four 32-bit instr. fetch 


System Bus: 64-bit data, 8-bit check, 12-bit command 


Instruction Decode 
Register Mapping 


R10000 


\Addressi< 


SC Address 


128-bit refill or writeback 


Data Cache 
32 Kbytes 


2-way Set Associative 


2 Banks 
8-word blocks 


64-bit load or store 


Secondary Cache 


512 Kbytes to 16 Mbytes) 
Synchronous Static RAM 


(4-Mbyte cache requires 


ten 256Kx18-bit 
RAM chips) 


far cat] 


Registers 


Integer 


64 Integer 


Queue 


7) 
ao 
=o 
is 
+0 
or 


Figure 1-5 Block Diagram of the R10000 Processor 
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Functional Units 
The five execution pipelines allow overlapped instruction execution by issuing instructions 
to the following five functional units: 
* two integer ALUs (ALUI and ALU2) 
e the Load/Store unit (address calculate) 
e the floating-point adder 


e the floating-point multiplier 


There are also three “iterative” units to compute more complex results: 


e Integer multiply and divide operations are performed by an Integer Multiply/ 
Divide execution unit; these instructions are issued to ALU2. ALU2 remains 
busy for the duration of the divide. 


e Floating-point divides are performed by the Divide execution unit; these 
instructions are issued to the floating-point multiplier. 


¢ Floating-point square root are performed by the Square-root execution unit; 
these instructions are issued to the floating-point multiplier. 


Increase in pre-decode buffering (R12000) 


Up to 12 instruction may be buffered before being decoded. This should normally be 
invisible to the end user, but can be important when debugging systems in uncached-mode, 
since fetch and decode are now further de-coupled. 


Primary Instruction Cache (I-cache) 


The primary instruction cache has the following characteristics: 


* it contains 32 Kbytes, organized into 16-word blocks, is 2-way set associative, 
using a least-recently used (LRU) replacement algorithm 


¢ it reads four consecutive instructions per cycle, beginning on any word 
boundary within a cache block, but cannot fetch across a block boundary. 


e its instructions are predecoded, its fields are rearranged, and a 4-bit unit select 
code is appended 


e it checks parity on each word 


e it permits non-blocking instruction fetch 


Primary Data Cache (D-cache) 


The primary data cache has the following characteristics: 
¢ it has two interleaved arrays (two 16 Kbyte ways) 


* it contains 32 Kbytes, organized into 8-word blocks, is 2-way set associative, 
using an LRU replacement algorithm. 


¢ it handles 64-bit load/store operations 
¢ it handles 128-bit refill or write-back operations 
¢ it permits non-blocking loads and stores 


e it checks parity on each byte 
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* Branch Target Address Cache (R12000) 


This 32-entry two-way set-associative cache holds the target addresses of previously-taken 
branches. When a branch is executed a hit in the BTAC eliminates the one-cycle fetch 
bubble with the R10000 experiences for every taken branch. However, if a branch which 
hits in the BTAC is actually predicted not-taken, then a one cycle fetch bubble is introduced 
where none was present before. Performance simulations indicate that the BTAC is a net 
win, but because of its “mixed-blessing” nature, a mechanism has been provided to disable 
it via software. (See description of changes to diag register). 


Instruction Decode And Rename Unit 


Branch Unit 


Errata 
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The instruction decode and rename unit has the following characteristics: 


it processes 4 instructions in parallel 


it replaces logical register numbers with physical register numbers (register 
renaming) 


- it maps integer registers into a 33-word-by-6-bit mapping table that has 
4 write and 12 read ports 


- it maps floating-point registers into a 32-word-by-6-bit mapping table 
that has 4 write and 16 read ports 


it has a 32-entry active list of all instructions within the pipeline. 


The branch unit has the following characteristics: 


it allows one branch per cycle 
conditional branches can be executed speculatively, up to 4-deep 
it has a 44-bit adder to compute branch addresses 


it has a 4-quadword branch-resume buffer, used for reversing mispredicted 
speculatively-taken branches 


the Branch Return Cache contains four instructions following a subroutine 
call, for rapid use when returning from leaf subroutines 


it has program trace RAM that stores the program counter for each instruction 
in the pipeline 
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External Interfaces 


The external interfaces have the following characteristics: 


¢ a 64-bit System interface allows direct-connection for 2-way to 
4-way multiprocessor systems. 8-bit ECC Error Check and Correction is 
made on address and data transfers. 


* asecondary cache interface with 128-bit data path and tag fields. 9-bit ECC 
Error Check and Correction is made on data quadwords, 7-bit ECC is made on 
tag words. It allows connection to an external secondary cache that can range 
from 512 Kbytes to 16 Mbytes, using external static RAMs. The secondary 
cache can be organized into either 16- or 32-word blocks, and is 2-way set 
associative. 


Bit definitions are given in Chapter 3. 


Additional cycles for System Interface transactions (R12000) 


All transactions which go through the system interface unit (in particular, SCache refills 
and writebacks) have one additional CPU-clock of latency added to them. 
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1.4 Instruction Queues 


Chapter 1 Introduction to the R10000 Processor 


The processor keeps decoded instructions in three instruction queues, which dynamically 
issue instructions to the execution units. The queues allow the processor to fetch 
instructions at its maximum rate, without stalling because of instruction conflicts or 
dependencies. 


Each queue uses instruction tags to keep track of the instruction in each execution pipeline 
stage. These tags set a Done bit in the active list as each instruction is completed. 


FP and Integer-Queue Issue Policy (R12000) 


Integer Queue 


The integer and floating-point queues are altered so that they are now composed of two 8- 
entry banks. Instructions are issued into the two banks in an alternating fashion. Each bank 
independently nominates instructions for the functional units. For each FU, the banks 
nominate the oldest instruction they contain which is ready to execute. If both banks 
nominate an instruction for a given FU, a winner is chosen by a priority bit which alternates 
between the two banks on each cycle. 


The integer queue issues instructions to the two integer arithmetic units: ALU1 and ALU?2. 


The integer queue contains 16 instruction entries. Up to four instructions may be written 
during each cycle; newly-decoded integer instructions are written into empty entries in no 
particular order. Instructions remain in this queue only until they have been issued to an 
ALU. 


Branch and shift instructions can be issued only to ALU1. Integer multiply and divide 
instructions can be issued only to ALU2. Other integer instructions can be issued to either 
ALU. 


The integer queue controls six dedicated ports to the integer register file: two operand read 
ports and a destination write port for each ALU. 


Address calculation for load/store instructions uses integer queue (R12000) 
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When load, store, cacheop, or prefetch instructions are decoded, they are sent to both the 
AQ and IQ units. The IQ treats the address-calculate unit as a third “ALU” and issues 
instructions to it. When an instruction completes address calculation, the results are 
forwarded to the AQ. Unlike previously, if an address instruction must be retried for any 
reason, address calculation is not redone. If the address queue is full, but the integer queue 
has free entries at the time a load/store instruction is decoded, the load/store is sent only to 
the integer queue. When the address queue has an available entry the calculated address is 
forwarded to that entry and the remainder of the load/store execution continues. 


Floating-Point Queue 


Address Queue 
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The floating-point queue issues instructions to the floating-point multiplier and the floating- 
point adder. 


The floating-point queue contains 16 instruction entries. Up to four instructions may be 
written during each cycle; newly-decoded floating-point instructions are written into empty 
entries in random order. Instructions remain in this queue only until they have been issued 
to a floating-point execution unit. 


The floating-point queue controls six dedicated ports to the floating-point register file: two 
operand read ports and a destination port for each execution unit. 


The floating-point queue uses the multiplier’s issue port to issue instructions to the square- 
root and divide units. These instructions also share the multiplier’s register ports. 


The floating-point queue contains simple sequencing logic for multiple-pass instructions 
such as Multiply-Add. These instructions require one pass through the multiplier, then one 
pass through the adder. 


The address queue issues instructions to the load/store unit. 


The address queue contains 16 instruction entries. Unlike the other two queues, the address 
queue is organized as a circular First-In First-Out (FIFO) buffer. A newly decoded load/ 
store instruction is written into the next available sequential empty entry; up to four 
instructions may be written during each cycle. 


The FIFO order maintains the program’s original instruction sequence so that memory 
address dependencies may be easily computed. 


Instructions remain in this queue until they have graduated; they cannot be deleted 
immediately after being issued, since the load/store unit may not be able to complete the 
operation immediately. 


The address queue contains more complex control logic than the other queues. An issued 
instruction may fail to complete because of a memory dependency, a cache miss, or a 
resource conflict; in these cases, the queue must continue to reissue the instruction until it 
is completed. 


The address queue has three issue ports: 


e First, it issues each instruction once to the address calculation unit. This unit 
uses a 2-stage pipeline to compute the instruction’s memory address and to 
translate it in the TLB. Addresses are stored in the address stack and in the 
queue’s dependency logic. This port controls two dedicated read ports to the 
integer register file. If the cache is available, it is accessed at the same time as 
the TLB. A tag check can be performed even if the data array is busy. 


33 


34 


Chapter 1 Introduction to the R10000 Processor 


e Second, the address queue can re-issue accesses to the data cache. The queue 
allocates usage of the four sections of the cache, which consist of the tag and 
data sections of the two cache banks. Load and store instructions begin with 
a tag check cycle, which checks to see if the desired address is already in 
cache. If it is not, a refill operation is initiated, and this instruction waits until 
it has completed. Load instructions also read and align a doubleword value 
from the data array. This access may be either concurrent to or subsequent to 
the tag check. If the data is present and no dependencies exist, the instruction 
is marked done in the queue. 


e Third, the address queue can issue store instructions to the data cache. A store 
instruction may not modify the data cache until it graduates. Only one store 
can graduate per cycle, but it may be anywhere within the four oldest 
instructions, if all previous instructions are already completed. 


The access and store ports share four register file ports (integer read and write, floating- 
point read and write). These shared ports are also used for Jump and Link and Jump 
Register instructions, and for move instructions between the integer and register files. 
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1.5 Program Order and Dependencies 


From a programmer’s perspective, instructions appear to execute sequentially, since they 
are fetched and graduated in program order (the order they are presented to the processor 
by software). When an instruction stores a new value in its destination register, that new 
value is immediately available for use by subsequent instructions. 


Internal to the processor, however, instructions are executed dynamically, and some results 
may not be available for many cycles; yet the hardware must behave as if each instruction 
is executed sequentially. 


This section describes various conditions and dependencies that can arise from them in 
pipeline operation, including: 


* instruction dependencies 

* execution order and stalling 

e branch prediction and speculative execution 
e resolving operand dependencies 


e resolving exception dependencies 


Instruction Dependencies 


Each instruction depends on all previous instructions which produced its operands, because 
it cannot begin execution until those operands become valid. These dependencies 
determine the order in which instructions can be executed. 


Execution Order and Stalling 


The actual execution order depends on the processor’s organization; in a typical pipelined 
processor, instructions are executed only in program order. That is, the next sequential 
instruction may begin execution during the next cycle, if all of its operands are valid. 
Otherwise, the pipeline stalls until the operands do become valid. 


Since instructions execute in order, stalls usually delay all subsequent instructions. 


A clever compiler can improve performance by re-arranging instructions to reduce the 
frequency of these stall cycles. 


¢ In an in-order superscalar processor, several consecutive instructions may 
begin execution simultaneously, if all their operands are valid, but the 
processor stalls at any instruction whose operands are still busy. 


¢ In an out-of-order superscalar processor, such as the R10000, instructions are 
decoded and stored in queues. Each instruction is eligible to begin execution 
as soon as its operands become valid, independent of the original instruction 
sequence. In effect, the hardware rearranges instructions to keep its execution 
units busy. This process is called dynamic issuing. 
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Branch Prediction and Speculative Execution 


Errata 


Although one or more instructions may begin execution during each cycle, each instruction 
takes several (or many) cycles to complete. Thus, when a branch instruction is decoded, its 
branch condition may not yet be known. However, the R10000 processor can predict 
whether the branch is taken, and then continue decoding and executing subsequent 
instructions along the predicted path. 


When a branch prediction is wrong, the processor must back up to the original branch and 
take the other path. This technique is called speculative execution. Whenever the processor 
discovers a mispredicted branch, it aborts all speculatively-executed instructions and 

restores the processor’s state to the state it held before the branch. However, the cache state 


is not restored (see the section titled “Side Effects of Speculative Execution”). 


Branch prediction can be controlled by the CPO Diagnostic register. Branch Likely 
instructions are always predicted as taken, which also means the instruction in the delay slot 


of the Branch Likely instruction will always be speculatively executed. Since the branch 


predictor is neither used nor updated by branch-likely instructions, these instructions do not 
affect the prediction of “normal” conditional branches. 


Resolving Operand Dependencies 
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Operands include registers, memory, and condition bits. Each operand type has its own 
dependency logic. In the R10000 processor, dependencies are resolved in the following 
manner: 


* register dependencies are resolved by using register renaming and the 
associative comparator circuitry in the queues 


* memory dependencies are resolved in the Load/Store Unit 


¢ condition bit dependencies are resolved in the active list and instruction 
queues 
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Resolving Exception Dependencies 


Strong Ordering 


In addition to operand dependencies, each instruction is implicitly dependent upon any 
previous instruction that generates an exception. Exceptions are caused whenever an 
instruction cannot be properly completed, and are usually due to either an untranslated 
virtual address or an erroneous operand. 


The processor design implements precise exceptions, by: 
e identifying the instruction which caused the exception 
* preventing the exception-causing instruction from graduating 


e aborting all subsequent instructions 


Thus, all register values remain the same as if instructions were executed singly. 
Effectively, all previous instructions are completed, but the faulting instruction and all 
subsequent instructions do not modify any values. 


A multiprocessor system that exhibits the same behavior as a uniprocessor system in a 
multiprogramming environment is said to be strongly ordered. 


The R10000 processor behaves as if strong ordering is implemented, although it does not 
actually execute all memory operations in strict program order. 


In the R10000 processor, store operations remain pending until the store instruction is ready 
to graduate. Thus, stores are executed in program order, and memory values are precise 
following any exception. 


For improved performance however, cached load operations my occur in any order, subject 
to memory dependencies on pending store instructions. To maintain the appearance of 
strong ordering, the processor detects whenever the reordering of a cached load might alter 
the operation of the program, backs up, and then re-executes the affected load instructions. 
Specifically, whenever a primary data cache block is invalidated due to an external 
coherency request, its index is compared with all outstanding load instructions. If there is 
a match and the load has been completed, the load is prevented from graduating. When it 
is ready to graduate, the entire pipeline is flushed, and the processor is restored to the state 
it had before the load was decoded. 


An uncached or uncached accelerated load or store instruction is executed when the 
instruction is ready to graduate. This guarantees strong ordering for uncached accesses. 


Since the R10000 processor behaves as if it implemented strong ordering, a suitable system 
design allows the processor to be used to create a shared-memory multiprocessor system 
with strong ordering. 
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An Example of Strong Ordering 


Given that locations X and Y have no particular relationship—that is, they are not in the 
same cache block—an example of strong ordering is as follows: 


e Processor A performs a store to location X and later executes a load from 
location Y. 


e Processor B performs a store to location Y and later executes a load from 
location X. 


The two processors are running asynchronously, and the order of the above two sequences 
is unknown. 


For the system to be strongly ordered, either processor A must load the new value of Y, or 
processor B must load the new value of X, or both processors A and B must load the new 
values of Y and X, respectively, under all conditions. 


If processors A and B both load old values of Y and X, respectively, under any conditions, 
the system is not strongly ordered. 


New Value Strongly 

Processor A Processor B Ordered 
No No No 
Yes No Yes 
No Yes Yes 


1.6 R10000 Pipelines 


Stage 1 


Stage 2 


Chapter 1 Introduction to the R10000 Processor 


This section describes the stages of the superscalar pipeline. 


Instructions are processed in six partially-independent pipelines, as shown in Figure 1-4. 
The Fetch pipeline reads instructions from the instruction cache’, decodes them, renames 
their registers, and places them in three instruction queues. The instruction queues contain 
integer, address calculate, and floating-point instructions. From these queues, instructions 
are dynamically issued to the five pipelined execution units. 


In stage 1, the processor fetches four instructions each cycle, independent of their 
alignment in the instruction cache — except that the processor cannot fetch across a 16- 
word cache block boundary. These words are then aligned in the 4-word Instruction 
register. 


If any instructions were left from the previous decode cycle, they are merged with new 
words from the instruction cache to fill the Instruction register. 


In stage 2, the four instructions in the /nstruction register are decoded and renamed. 
(Renaming determines any dependencies between instructions and provides precise 
exception handling.) When renamed, the /ogical registers referenced in an instruction are 
mapped to physical registers. Integer and floating-point registers are renamed 
independently. 


A logical register is mapped to a new physical register whenever that logical register is the 
destination of an instruction. Thus, when an instruction places a new value in a logical 
register, that logical register is renamed (mapped) to a new physical register, while its 
previous value is retained in the old physical register. 


As each instruction is renamed, its logical register numbers are compared to determine if 
any dependencies exist between the four instructions decoded during this cycle. After the 
physical register numbers become known, the Physical Register Busy table indicates 
whether or not each operand is valid. The renamed instructions are loaded into integer or 
floating-point instruction queues. 


Only one branch instruction can be executed during stage 2. If the instruction register 
contains a second branch instruction, this branch is not decoded until the next cycle. 


The branch unit determines the next address for the Program Counter; if a branch is taken 
and then reversed, the branch resume cache provides the instructions to be decoded during 
the next cycle. 


+ The processor checks only the instruction cache during an instruction fetch; it does not check 
the data cache. 
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Stage 3 


In stage 3, decoded instructions are written into the queues. Stage 3 is also the start of each 
of the five execution pipelines. 


Stages 4-6 
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In stages 4 through 6, instructions are executed in the various functional units. These units 
and their execution process are described below. 


Floating-Point Multiplier (3-stage Pipeline) 


Single- or double-precision multiply and conditional move operations are executed in this 
unit with a 2-cycle latency anda 1-cycle repeat rate. The multiplication is completed during 
the first two cycles; the third cycle is used to pack and transfer the result. 


Floating-Point Divide and Square-Root Units 


Single- or double-precision division and square-root operations can be executed in parallel 
by separate units. These units share their issue and completion logic with the floating-point 
multiplier. 


Floating-Point Adder (3-stage Pipeline) 


Single- or double-precision add, subtract, compare, or convert operations are executed with 
a 2-cycle latency and a 1-cycle repeat rate. Although a final result is not calculated until the 
third pipeline stage, internal bypass paths set a 2-cycle latency for dependent add or 
multiply instructions. 


Integer ALU1 (1-stage Pipeline) 


Integer add, subtract, shift, and logic operations are executed with a 1-cycle latency and a 
1-cycle repeat rate. This ALU also verifies predictions made for branches that are 
conditional on integer register values. 


Integer ALU2 (1-stage Pipeline) 


Integer add, subtract, and logic operations are executed with a |-cycle latency and a |-cycle 
repeat rate. Integer multiply and divide operations take more than one cycle. 
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Address Calculation and Translation in the TLB 


A single memory address can be calculated every cycle for use by either an integer or 
floating-point load or store instruction. Address calculation and load operations can be 
calculated out of program order. 


Errata 


The calculated address is translated from a 44-bit virtual address into a 40-bit physical 
address using a translation-lookaside buffer. The TLB contains 64 entries, each of which 
can translate two pages. Each entry can select a page size ranging from 4 Kbytes to 16 
Mbytes, inclusive, in powers of 4, as shown in Figure 1-6. 


Exponent gle 
Page Size | 4 Kbytes | 16 ore ] 64 byes | 256 aa 1 rae | 4 von | 16 cmt 
Virtual address VA(11) VA(15) VA 


ce 1-6 TLB Page on 


Load instructions have a 2-cycle latency if the addressed data is already within the data 
cache. 


Store instructions do not modify the data cache or memory until they graduate. 
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1.7 Implications of R10000 Microarchitecture on Software 


The R10000 processor implements the MIPS architecture by using the following 
techniques to improve throughput: 


e superscalar instruction issue 


e speculative execution 


e non-blocking caches 


These microarchitectural techniques have special implications for compilation and code 


scheduling. 


Superscalar Instruction Issue 
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The R10000 processor has parallel functional units, allowing up to four instructions to be 
fetched and up to five instructions to be issued or completed each cycle. An ideal code 
stream would match the fetch bandwidth of the processor with a mix of independent 
instructions to keep the functional units as busy as possible. 


To create this ideal mix, every cycle the hardware would select one instruction from each 
of the columns below. (Floating-point divide, floating-point square root, integer multiply 
and integer divide cannot be started on each cycle.) The processor can look ahead in the 


code, so the mix should be kept close to the ideal described below. 


Column A | Column B Column C Column D Column E 
FPadd FP mul FPload add/sub add/sub 
FPdiv FPstore shift mul 
FPsqrt load branch div 
store logical logical 


Data dependencies are detected in hardware, but limit the degree of parallelism that can be 
achieved. Compilers can intermix instructions from independent code streams. 


Speculative Execution 


Errata 
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Speculative execution increases parallelism by fetching, issuing, and completing 
instructions even in the presence of unresolved conditional branches and possible 
exceptions. Following are some suggestions for increasing program efficiency: 


¢ Compilers should reduce the number of branches as much as possible 
¢ “Jump Register” instructions should be avoided. 


¢ Aggressive use of the new integer and floating point conditional move 
instructions is recommended. 


e Branch prediction rates may be improved by organizing code so that each 
branch goes the same direction most of the time, since a branch that is taken 
50% of the time has higher average cost than one taken 90% of the time. The 
MIPS IV conditional move instructions may be effective in improving 
performance by replacing unpredictable branches. 


Side Effects of Speculative Execution 


To improve performance, R10000 instructions can be speculatively fetched and executed. 
Side-effects are harmless in cached coherent operations; however there are potential side- 
effects with non-coherent cached operations. These side-effects are described in the 


sections that follow. 


Speculatively fetched instructions and speculatively executed loads or stores to a cached 
address initiate a Processor Block Read Request to the external interface if it misses in the 


cache. The speculative operation may modify the cache state and/or data, and this 


modification may not be reversed even if the speculation turns out to be incorrect and the 


instruction is aborted. 


Speculative Processor Block Read Request to an I/O Address 


Accesses to I/O addresses often cause side-effects. Typically, such I/O addresses are 
mapped to an uncached region and uncached reads and writes are made as double/single/ 
partial-word reads and writes (non-block reads and writes) in R10000. Uncached reads and 


writes are guaranteed to be non-speculative. 


However, if R10000 has a “garbage” value in a register, a speculative block read request to 
an unpredictable physical address can occur, if it speculatively fetches data due to a Load 
or Jump Register instruction specifying this register. Therefore, speculative block accesses 
to load-sensitive I/O areas can present an unwanted side-effect. 
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Unexpected Write Back Due to Speculative Store Instruction 


When a Store instruction is speculated and the target address of the speculative Store 
instruction is missing in the cache, the cache line is refilled and the state is marked to be 
Dirty. However the refilled data may not be actually changed in the cache if this store 
instruction is later aborted. This could present a side-effect in cases such as the one 


described below: 


¢ The processor is storing data sequentially to memory area A, using a code-loop 


that includes Store and Cond.branch instructions. 


A DMA write operation is performed to memory area B. 


DMA area B is contiguous to the sequential storage area A. 


The DMA operation is noncoherent. 


The processor does not cache any lines of DMA area B. 


If the processor and the DMA operations are performed in sequence, the following could 
occur: 


1. Due to speculative execution at the exit of the code-loop, the line of data beyond the 
end of the memory area A — that is, the starting line of memory area B — is refilled 


to the cache. This cache line is then marked Dirty. 


2. The DMA operation starts writing noncoherent data into memory area B. 


3. Acache line replacement is caused by later activities of the processor, in which the 
cache line is written back to the top of area B. Thus, the first line of the DMA area B 


is overwritten by old cache data, resulting in incorrect DMA operation and data. 


The OS can restrict the writable pages for each user process and so can prevent a user 
process from interfering with an active DMA space. The kernel, on the other hand, retains 


xkphys and kseg0 addresses in registers. There is no write protection against the speculative 


use of the address values in these registers. User processes which have pages mapped to 
physical spaces not in RAM may also have side-effects. These side-effects can be avoided 


if DMA is coherent. 


Speculative Instruction Fetch 


The change in a cache line’s state due to a speculative instruction fetch is not reversed if the 


speculation is aborted. This does not cause any problems visible to the program except 
during a noncoherent memory operation. Then the following side-effect exists: if a 
noncoherent line is changed to Clean Exclusive and this line is also present in noncoherent 


space, the noncoherent data could be modified by an external component and the processor 


would then have stale data. 
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Workarounds for Noncoherent Cached Systems 


The suggestions presented below are not exhaustive; the solutions and trade-offs are system 


dependent. Any one or more of the items listed below might be suitable in a particular 


system, and testing and simulations should be used to verify their efficacy. 


1. 


The external agent can reject a processor block read request to any I/O location in 


which a speculative load would cause an undesired affect. Rejection is made by 
returning an external NACK completion response. 


A serializing instruction such as a cache barrier or a CPO instruction can be used to 


prevent speculation beyond the point where speculative stores are allowed to occur. 
This could be at the beginning of a basic block that includes instructions that can cause 
a store with an unsafe pointer. (Stores to addresses like stack-relative, global-pointer- 
relative and pointers to non-I/O memory might be safe.) Speculative loads can also 
cause a side-effect. To make sure there is no stale data in the cache as a result of 
undesired speculative loads, portions of the cache referred by the address of the DMA 
read buffers could be flushed after every DMA transfer from the I/O devices. 


Make references to appropriate I/O spaces uncached by changing the cache coherency 


attribute in the TLB. 


Generally, arbitrary accesses can be controlled by mapping selected addresses through 


the TLB. However, references to an unmapped cached xkphys region could have 
hazardous affects on I/O. A solution for this is given below: 


First of all, note that the xkphys region is hard-wired into cached and uncached regions, 
however the cache attributes for the kseg0 region are programmed through the Config 
register. Therefore, clear the KX bit (to a zero) and set (to ones) the SX and UX bits in 
the Status register. This disables access to the xkphys region and restricts access to only 
the User and Supervisor portions of the 64-bit address space. 


In general, the system needs either a coherent or a noncoherent protocol — but not 
both. Therefore these cache attributes can be used by the external hardware to filter 


accesses to certain parts of the kseg0 region. For instance, the cache attributes for the 
kseg0 address space might be defined in the Config register to be cache coherent while 


the cache attributes in the TLB for the rest of virtual space are defined to be cached- 
noncoherent or uncached. The external hardware could be designed to reject all cache 
coherent mode references to the memory except to that prior-defined safe space in 
ksegO within which there is no possibility of an I/O DMA transfer. Then before the 
DMA tread process and before the cache is flushed for the DMA read buffers, the cache 
attributes in the TLB for the I/O buffer address space are changed from noncoherent 
to uncached. After the DMA read, the access modes are returned to the cached- 
noncoherent mode. 


Just before load/store instruction, use a conditional move instruction which tests for the 


reverse condition in the speculated branch, and make all aborted branch assignments 


safe. An example is given below: 
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bne r, r0, label 

movn ra, r0, r1 # test to see if r1 !=0;ifr1 !=0 then branch 
# is mispredicted; move safe address (r0) 
# into ra 

Id r4, 0 (ra) # Without the previous movn, this Ild 


# could create damaging read. 


In the above example, without the MOVN the read to the address in register ra could 


be speculatively executed and later aborted. It is possible that this load could be 
premature and thus damaging. The MOVN guarantees that if there is a misprediction 


(r1 is not equal to 0) ra will be loaded with an address to which a read will not be 
damaging. 


6. The following is similar to the conditional-move example given above, in that it 


protects speculation only for a single branch, but in some instances it may be more 
efficient than either the conditional move or the cache barrier workarounds. 


This workaround uses the fact that branch-likely instructions are always predicted as 
taken by the R10000. Thus, any incorrect speculation by the R10000 on a branch- 
likely always occurs on a taken path. Sample code is: 


beql rx, r1, label 
nop 
sw r2, Ox0(r1) 


The store to r/ will never be to an address referred to by the content of rx, because the 


store will never be executed speculatively. Thus, the address referred to by the content 


of rx is protected from any spurious write-backs. 


Nonblocking Caches 


Chapter 1 Introduction to the R10000 Processor 


This workaround is most useful when the branch is often taken, or when there are few 
instructions in the protected block that_are not memory operations. Note that no 
instructions in a block following a branch-likely will be initiated by speculation on that 
branch; however, in the case of a serial instruction workaround, only memory 
operations are prevented from speculative initiation. In the case of the conditional- 
move workaround, speculative initiation of all instructions continues unimpeded. Also, 
similar to the conditional-move workaround, this workaround only protects fall- 
through blocks from speculation on the immediately preceding branch. Other 
mechanisms must be used to ensure that no other branches speculate into the protected 
block. However, if a block that dominates! the fall-through block can be shown to be 
protected, this may be sufficient. Thus, if block (a) dominates block (b), and block (b) 
is the fall-through block shown above, and block (a) is the immediately previous block 
in the program (i.e., only the single conditional branch that is being replaced intervenes 
between (a) and (b)), then ensuring that (a) is protected by serial instruction means a 
branch-likely can safely be used as protection for (b). 


As processor speed increases, the processor’s data latency and bandwidth requirements rise 
more rapidly than the latency and bandwidth of cost-effective main memory systems. The 
memory hierarchy of the R10000 processor tries to minimize this effect by using large set- 
associative caches and higher bandwidth cache refills to reduce the cost of loads, stores, and 
instruction fetches. Unlike the R4400, the R10000 processor does not stall on data cache 
misses, instead defers execution of any dependent instructions until the data has been 
returned and continues to execute independent instructions (including other memory 
operations that may miss in the cache). Although the R10000 allows a number of 
outstanding primary and secondary cache misses, compilers should organize code and data 
to reduce cache misses. When cache misses are inevitable, the data reference should be 
scheduled as early as possible so that the data can be fetched in parallel with other unrelated 
operations. 


As a further antidote to cache miss stalls, the R10000 processor supports prefetch 
instructions, which serve as hints to the processor to move data from memory into the 
secondary and primary caches when possible. Because prefetches do not cause dependency 
stalls or memory management exceptions, they can be scheduled as soon as the data address 
can be computed, without affecting exception semantics. Indiscriminate use of prefetch 
instructions can slow program execution because of the instruction-issue overhead, but 
selective use of prefetches based on compiler miss prediction can yield significant 
performance improvement for dense matrix computations. 


t In compiler parlance, block (a) dominates block (b) if and only if every time block (b) is 
executed, block (a) is executed first. Note that block (a) does not have to immediately precede 


block (b) in execution order; some other block may intervene. 
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As it executes programs, the R10000 superscalar processor performs many operations in 
parallel. Instructions can also be executed out of order. Together, these two facts greatly 
improve performance, but they also make it difficult to predict the time required to execute 
any section of a program, since it often depends on the instruction mix and the critical 
dependencies between instructions. 


The processor has five largely independent execution units, each of which are 
individualized for a specific class of instructions. Any one of these units may limit 
processor performance, even as the other units sit idle. If this occurs, instructions which 
use the idle units can be added to the program without adding any appreciable delay. 


Chapter 1 Introduction to the R10000 Processor 


User Instruction Latency and Repeat Rate 


Table 1-1 shows the latencies and repeat rates for all user instructions executed in ALU1, 
ALU2, Load/Store, Floating-Point Add and Floating-Point Multiply functional units 
(definitions of latency and repeat rate are given in the Glossary). Kernel instructions are 
not included, nor are control instructions not issued to these execution units. 


Table 1-1 Latencies and Repeat Rates for User Instructions 


Instruction Type Execution Unit | Latency rising Comment 

Integer Instructions 
Add/Sub/Logical/Set ALU 1/2 1 
MF/MT HI/LO ALU 1/2 1 1 
Shift/LUI ALU 1 1 1 
Cond. Branch Evaluation ALU 1 1 1 
Cond. Move ALU 1 1 1 
MULT ALU 2 5/6 6 Latency relative to Lo/Hi 
MULTU ALU 2 6/7 7 Latency relative to Lo/Hi 
DMULT ALU 2 9/10 10 Latency relative to Lo/Hi 
DMULTU ALU 2 10/11 11 Latency relative to Lo/Hi 
DIV/DIVU ALU 2 34/35 35 Latency relative to Lo/Hi 
DDIV/DDIVU ALU 2 66/67 67 Latency relative to Lo/Hi 
Load (not include loads to CP1) Load/Store 2 1 Assuming cache hit 
Store Load/Store - 1 Assuming cache hit 

Floating-Point Instructions 
MTC1/DMTC1 ALU 1 3 1 
Add/Sub/Abs/Neg/Round/Trunc/ 
Ceil/Floor/C.cond fap? 2 : 
CVT.S.W/CVT.S.L FADD 4 2 Repeat rate is on average 
CVT (others) FADD 2 1 
Mul FMPY 2 1 
MFC1/DMFC1 FMPY 2 1 
Cond. Move/Move FMPY 2 1 
DIV.S/RECIP.S FMPY 12 14 
DIV.D/RECIP.D FMPY 19 21 
SQRT.S FMPY 18 20 
SQRT.D FMPY 33 35 
RSQRT.S FMPY 30 20 
RSQRT.D FMPY 52 35 
Latency is 2 only if the result is used as the 

MaDe cP Me™ Gi ; eae specitiéd by fr of another MADD 
LWC1/LDC1/LWXC1/LDXC1 LoadStore Assuming cache hit 
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Please note the following about Table 1-1: 


For integer instructions, conditional trap evaluation takes a single cycle, 
like conditional branches. 


Branches and conditional moves are not conditionally issued. 


The repeat rate above for Load/Store does not include Load Link and 
Store Conditional. 


Prefetch instruction is not included here. 


The latency for multiplication and division depends upon the next 
instruction. 


An instruction using register Lo can be issued one cycle earlier than one 
using Hi. 

For floating-point instructions, CP1 branches are evaluated in the 
Graduation Unit. 


CTC1 and CFC1 are not included in this table. 


The repeat pattern for the CVT.S.(W/L) is “II x x II x x ...”; the repeat 
rate given here, 2, is the average. 


The latency for MADD instructions is 2 cycles if the result is used as the 
operand specified by fr of the second MADD instruction. 


Load Linked and Store Conditional instructions (LL, LLD, SC, and SCD) 
do not implicitly perform SYNC operations in the R10000. Any of the 
following events that occur between a Load Linked and a Store 
Conditional will cause the Store Conditional to fail: an exception; 
execution of an ERET, a load, a store, a SYNC, a CacheOp, a prefetch, or 
an external intervention/invalidation on the block containing the linked 
address. Instruction cache misses do not cause the Store Conditional to 
fail. 


Up to four branches can be evaluated at one cycle.! 


For more information about implementations of the LL, SC, and SYNC instructions, please 
see the section titled, R10000-Specific CPU Instructions, in this chapter. 


+ Only one branch can be decoded at any particular cycle. Since each conditional branch is 
predicted, the real direction of each branch must be “evaluated.” For example, 


beq 12,r3,L1 
nop 


A comparison of r2 and r3 is made to determine whether the branch is taken or not. If the 
branch prediction is correct, the branch instruction is graduated. Otherwise, the processor 
must back out of the instruction stream decoded after this branch, and inform the [Fetch to 
fetch the correct instructions. The evaluation is made in the ALU for integer branches and in 
the Graduation Unit for floating-point branches. A single integer branch can be evaluated 
during any cycle, but there may be up to 4 condition codes waiting to be evaluated for floating- 
point branches. Once the condition code is evaluated, all dependant FP branches can be 
evaluated during the same cycle. 


Other Performance Issues 


Cache Performance 


Chapter 1 Introduction to the R10000 Processor 


Table 1-1 shows execution times within the functional units only. Performance may also 
be affected by instruction fetch times, and especially by the execution of conditional 
branches. 


In an effort to keep the execution units busy, the processor predicts branches and 
speculatively executes instructions along the predicted path. When the branch is predicted 
correctly, this significantly improves performance: for typical programs, branch prediction 
is 85% to 90% correct. When a branch is mispredicted, the processor must discard 
instructions which were speculatively fetched and executed. Usually, this effort uses 
resources which otherwise would have been idle, however in some cases speculative 
instructions can delay previous instructions. 


The execution of load and store instructions can greatly affect performance. These 
instructions are executed quickly if the required memory block is contained in the primary 
data cache, otherwise there are significant delays for accessing the secondary cache or main 
memory. Out-of-order execution and non-blocking caches reduce the performance loss due 
to these delays, however. 


The latency and repeat rates for accessing the secondary cache are summarized in Table 1- 
2. These rates depend on the ratio of the secondary cache’s clock to the processor’s internal 
pipeline clock. The best performance is achieved when the clock rates are equal; slower 
external clocks add to latency and repeat times. 


The primary data cache contains 8-word blocks, which are refilled using 2-cycle transfers 
from the quadword-wide secondary cache. Latency runs to the time in which the processor 
can use the addressed data. 


The primary instruction cache contains 16-word blocks, which are refilled using 4-cycle 


transfers. 
Table 1-2 Latency and Repeat Rates for Secondary Cache Reads 
SCCIkDiv Latency? ae 
Mode (PCIk Cycles) (PCIk Cycles) 
1 6 2 (data cache) 
4 (instruction cache) 
ae 3 (data cache) 
es oe 6 (instruction cache) 
4 (data cache) 
-j27 
# aca 8 (instruction cache) 


£~ Assumes the cache way was correctly predicted, and there are no conflicting requests. 


* Repeat rate = PClk cycles needed to transfer 2 quadwords (data cache) or 4 quadwords (instruction 
cache). Rate is valid for bursts of 2 to 3 cache misses; if more than three cache misses in a row, there can 
be a 1-cycle “bubble.” 


+ Clock synchronization causes variability. 
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The processor mitigates access delays to the secondary cache in the following ways: 


The processor can execute up to 16 load and store instructions speculatively 
and out-of-order, using non-blocking primary and secondary caches. That is, 
it looks ahead in its instruction stream to find load and store instructions 
which can be executed early; if the addressed data blocks are not in the 
primary cache, the processor initiates cache refills as soon as possible. 


If a speculatively executed load initiates a cache refill, the refill is completed 
even if the load instruction is aborted. It is likely the data will be referenced 
again. 


The data cache is interleaved between two banks, each of which contains 
independent tag and data arrays. These four sections can be allocated 
separately to achieve high utilization. Five separate circuits compete for 
cache bandwidth (address calculate, tag check, load unit, store unit, external 
interface.) 


The external interface gives priority to its refill and interrogate operations. 
The processor can execute tag checks, data reads for load instructions, or data 
writes for store instructions. When the primary cache is refilled, any required 
data can be streamed directly to waiting load instructions. 


The external interface can handle up to four non-blocking memory accesses to 
secondary cache and main memory. 


Main memory typically has much longer latencies and lower bandwidth than the secondary 


cache, which make it difficult for the processor to mitigate their effect. Since main memory 


accesses are non-blocking, delays can be reduced by overlapping the latency of several 


operations. However, although the first part of the latency may be concealed, the processor 
cannot look far enough ahead to hide the entire latency. 


Programmers may use pre-fetch instructions to load data into the caches before it is needed, 
greatly reducing main memory delays for programs which access memory in a predictable 


sequence. 


Zz; 


System Configurations 


The R10000 processor provides the capability for a wide range of computer systems; this 
chapter describes some of the uni- and multiprocessor alternatives. 
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2.1 Uniprocessor Systems 
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In a typical uniprocessor system, the System interface of the R10000 processor connects in 
a point-to-point fashion with an external agent. Such a system is shown in Figure 2-1. The 
external agent is typically an ASIC that provides a gateway to the memory and I/O 
subsystems; in fact, this ASIC may incorporate the memory controller itself. 


If hardware I/O coherency is desired, the external agent may use the multiprocessor 
primitives provided by the processor to maintain cache coherency for interventions and 
invalidations. External duplicate tags can be used by the external agent to filter external 
coherency requests. 


Secondary 
Cache 


Secondary Cache Interface 


R10000 


System Interface 


Duplicate 
External | Tags 
Agent 


To Other System Resources 


Figure 2-1 Uniprocessor System Organization 
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2.2 Multiprocessor Systems 


Two types of multiprocessor systems can be implemented with R10000 processor: 
* adedicated external agent interfaces with each R10000 processor 


¢ up to four R10000 processors and an external agent reside on a cluster bus 


Multiprocessor Systems Using Dedicated External Agents 


A multiprocessor system may be created with R10000 processors by providing a dedicated 
external agent for each processor; such a system is shown in Figure 2-2. The external agent 
provides a path between the processor System interface and some type of coherent 

interconnect. In such a system, the processor provides support for three coherency schemes: 


* — snoopy-based 
* — snoopy-based with external duplicate tags and control 


e  directory-based with external directory structure and control 


Secondary 
Cache 


Secondary 
Cache 


Secondary Cache Interface Secondary Cache Interface 


R10000 R10000 


System Interface System Interface 


Duplicate | | Duplicate | 
External | Tags External eds 
Agent Agent 
Coherent Interconnect 
| Directory | 
' Structure | 


To Other System Resources 


Figure 2-2 Multiprocessor System Organization using Dedicated External Agents 
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Multiprocessor Systems Using a Cluster Bus 


A multiprocessor system may be created with R10000 processors by using a cluster bus 
configuration. Such a system is shown in Figure 2-3. A cluster bus is created by attaching 
the System interfaces of up to four R10000 processors with an external agent (the cluster 
coordinator). The cluster coordinator is responsible for managing the flow of data within 
the cluster. 


This organization can reduce the number of ASICs and the pin count needed for a small 
multiprocessor systems. 


The cluster bus protocol supports three coherency schemes: 
* — snoopy-based 
* — snoopy-based with external duplicate tags and control 


e  directory-based with external directory structure and control 


Secondary Secondary 
Cache Cache 


Secondary Cache Interface Secondary Cache Interface 


R10000 R10000 


System Interface System Interface 


Cluster Bus 


Cluster Tags | 
Coordinator [B ~~~ ~~~~ ~~ 
Directory Wl 

; Structure | 


To Other System Resources 


Figure 2-3 Multiprocessor System Organization Using the Cluster Bus 
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Interface Signal Descriptions 


This chapter gives a list and description of the interface signals. 

The R10000 interface signals may be divided into the following groups: 
¢ Power interface 
¢ Secondary Cache interface 
e System interface 


e Test interface 


The following sections present a summary of the external interface signals for each of these 
groups. An asterisk (*) indicates signals that are asserted as a logical 0. 
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3.1 Power Interface Signals 


Table 3-1 presents the R10000 processor power interface signals. 


Table 3-1 Power Interface Signals 


Signal Name Description Type 

Vcc core 

Me Vcc for the core circuits. i 
Vcc output driver secondary cache 

veCORG Vcc for the secondary cache interface output drivers. input 
Vcc output driver system 

wecQeys Vcc for the System interface output drivers. Input 
Voltage reference secondary cache 

eee Voltage reference for the secondary cache interface input receivers. put 
Voltage reference system 

NaetS ye Voltage reference for the System interface input receivers. map 
Voltage reference bypass 

VrefByp This pin must be tied to Vss (preferably) or VrefSys, through at leasta 100 ohm | Input 
resistor. 
Vss 

MBS Vss for the core circuits and output drivers. input 
Vcc PLL analog 

Mech? Vcc for the PLL analog circuits. Apu 
Vss PLL analog 

vores Vss for the PLL analog circuits. mput 
Vcc PLL digital 

eck? Vee for the PLL digital circuits. Input 
Vss PLL digital 

vere Vss for the PLL digital circuits. input 
DC voltages are OK 

DCOk The external agent asserts these two signals when Vcc, Input 
VecQ[SC,Sys], Vref[SC,Sys], Vec[Pa,Pd], and SysClk are stable. 

Errata 
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VrefByp description changed in Table 3-1. 
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3.2 Secondary Cache Interface Signals 


Errata 


Table 3-2; description of SCBAddr(18:0) is revised. Table 3-2 presents the R10000 


processor secondary cache interface signals. 


Table 3-2. Secondary Cache Interface Signals 
Signal Name Description Type 
SSRAME Clock Signals 
SCC1k(5:0) Secondary cache clock Gata 
SCC1k*(5:0) Duplicated complementary secondary cache clock outputs. P 
SSRAM Address Signals 
’ Secondary cache address bus 
Sena as SCBAddr is complementary SCAAddr 19-bit bus, which specifies the set address of the | Output 
‘ secondary cache data and tag SSRAM that is to be accessed. 

Secondary cache tag LSB address 
SCTagLSBAddr | Signal that specifies the least significant bit of the address for the secondary cache tag | Output 

SSRAM. 

SSRAM Data Signals 

SCADWay Secondary cache data way 
SCBDWay Duplicated signal that indicates the way of the secondary cache data SSRAM that is to | Output 

be accessed. 

; Secondary cache data bus FSiioto 

ecDaae) 128-bit bus to read/write cache data from/to secondary cache data SSRAM. Bugiecuenal 

Secondary cache data check bus 
SCDataChk(9:0) | A 10-bit bus used to read/write ECC and even parity from/to the secondary cache data _ | Bidirectional 

SSRAM. 
SCADOE* Secondary cache data output enable Guise 
SCBDOE* Duplicated signal that enables the outputs of the secondary cache data SSRAM. P 
SCADWr* Secondary cache data write enable amet 
SCBDWr* Duplicated signal that enables writing the secondary cache data SSRAM. P 
SCADCS* Secondary cache data chip select acount 
SCBDCS* Duplicated signal that enables the secondary cache data SSRAM. P 


£ All cache static RAM (SRAM) are synchronous SRAM (SSRAM). 
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Table 3-2 (cont.) Secondary Cache Interface Signals 


Signal Name Description Type 
SSRAM Tag Signals 
Secondary cache tag way 
Ser ey Signal indicating the way of the secondary cache tag SSRAM to be accessed. ouput 
: Secondary cache tag bus ees Sad 
Beaeseon) A 26-bit bus to read/write cache tags from/to the secondary cache tag SSRAM. Pusuceuonss 
. Secondary cache tag check bus Wass, Oy 
pTas Cine) A 7-bit bus used to read/write ECC from/to the secondary cache tag SSRAM. Biguecnenal 
Secondary cache tag output enable 
Ey 
Bene A signal that enables the outputs of the secondary cache tag SSRAM. Ougpu 
Secondary cache tag write enable 
ok 
picaaes A signal that enables writing the secondary cache tag SSRAM. ua 
SCTCS* Secondary cache tag chip select Output 
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A signal which enables the secondary cache tag SSRAM. 
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3.3 System Interface Signals 


Table 3-3 presents the R10000 processor System interface signals. 


Table 3-3 System Interface Signals 


external agent. 


Signal Name Description Type 
System Clock Signals 
SysClk System clock (acait 
SysClk* Complementary system clock input. P 
SysClkRet System clock return oe 
Complementary system clock return output used for termination of the system | Output 
SysClkRet* 
clock. 
System Arbitration Signals 
System request 
SysReq* The processor asserts this signal when it wants to perform a processor request | Output 
and it is not already master of the System interface. 
System grant 
SysGnt* The external agent asserts this signal to grant mastership of the System interface | Input 
to the processor. 
System release 
The master of the System interface asserts this signal for one SysClk cycle to ae 
Ey 
mysRel indicate that it will relinquish mastership of the System interface in the following Biareuons 
SysClk cycle. 
System Flow Control Signals 
System read ready 
SysRdRdy* The external agent asserts this signal to indicate that it can accept processor read | Input 
and upgrade requests. 
System write ready 
SysWrRdy* The external agent asserts this signal to indicate that it can accept processor write | Input 
and eliminate requests. 
System Address/Data Bus Signals 
System command 
SysCmd(11:0) A 12-bit bus for transferring commands between processor and the external Bidirectional 
agent. 
SysCmdPar Sy olen command Pus nally, Bidirectional 
y Odd parity for the system command bus. 
System address/data bus 
SysAD(63:0) A 64-bit bus for transferring addresses and data between R10000 and the Bidirectional 
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Table 3-3 (cont.) System Interface Signals 


Signal Name 


Description 


Type 


System State Bus Signals 


SysADChk(7:0) 


System address/data check bus 
An 8-bit ECC bus for the system address/data bus. 


Bidirectional 


SysVal* 


System valid 
The master of the System interface asserts this signal when it is driving valid 
information on the system command and system address/data buses. 


Bidirectional 


SysState(2:0) 


System state bus 
A 3-bit bus used for issuing processor coherency state responses and also 
additional status indications. 


Output 


SysStatePar 


System state bus parity 
Odd parity for the system state bus. 


Output 


SysStateVal* 


System state bus valid 
The processor asserts this signal for one SysClk cycle when issuing a processor 
coherency state response on the system state bus. 


Output 


System Response Bus Signals 


SysResp(4:0) 


System response bus 


A 5-bit bus used by the external agent for issuing external completion responses. 


Input 


SysRespPar 


System response bus parity 
Odd parity for the system response bus. 


Input 


SysRespVal* 


System response bus valid 
The external agent asserts this signal for one SysClk cycle when issuing an 
external completion response on the system response bus. 


Input 


System Miscellaneous Signals 


SysReset* 


System reset 
The external agent asserts this signal to reset the processor. 


Input 


SysNMI* 


System non-maskable interrupt 
The external agent asserts this signal to indicate a non-maskable interrupt. 


Input 


SysCorErr* 


System correctable error 
The processor asserts this signal for one SysClk cycle when a correctable error 
is detected and corrected. 


Output 


SysUncErr* 


System uncorrectable error 
The processor asserts this signal for one SysClIk cycle when an uncorrectable tag 
error is detected. 


Output 


SysGblPerf* 


System globally performed 
The external agent asserts this signal to indicate that all processor requests have 
been globally performed with respect to all external agents. 


Input 


SysCyc* 


System cycle 
The external agent may use this signal to define a virtual System interface clock 
in a hardware emulation environment. 


Input 
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3.4 Test Interface Signals 


Errata 


Table 3-4 presents the R10000 processor test interface signals. 


PLLDis and SelDVCO signal descriptions are revised in Table 3-4. 


Table 3-4 Test Interface SignalsPLLDis 


This signal must be tied to Vcc. 


<R12000> 
Changed Spare (1, 3) pins to NC (No Connection) 


Signal Name Description Type 
JTAG Signals 

TDI JTAG serial data input fasut 
Serial data input. 

JTDO JTAG serial data output 
Serial data output. 
JTAG clock 

as Clock input. 
JTAG mode select 

pis Mode select input. 

Miscellaneous Test Signals 

TCA Testability control A (for manufacturing test only) Input 
This signal must be tied to Vss, through a 100 ohm resistor. 

TCB Testability control B (for manufacturing test only) Input 
This signal must be tied to Vss, through a 100 ohm resistor. 

: PLL disable (for manufacturing test only) 

FLL.Dis This signal must be tied to Vss through a 100 ohm resistor. faput 
PLL Control Node (for manufacturing test only) 

PLLRC : ae 
There must be no connection made to this signal. 

PLLSpare(1:4) These four pins must be tied to Vss. 

Spare(1,3) These two pins must be tied to Vss, through a 100 ohm resistor. 
3-state Control 

3-State The system asserts this signal to 3-state all outputs and input/output Input 
pads except for SCClk, SCCLK*, and JTDO. 

SelIDVCO Select differential VCO (for manufacturing test only) Input 


The spare (1, 3), shown in this manual, page 63 tied to Vss through a 100 ohm resister, is 


used in R12000 for diagnostic purpose and thus for R12000 should not be connected to 


anything. 
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Unused Inputs 


Errata 


64 


Chapter 3 Interface Signal Descriptions 


Several input pins are unused during normal system operation, and should be tied to Vcc 
through resistors: 


¢ JTDI 
° JTCK 
¢ JTMS 


Several input pins are unused during normal system operation, and should be tied to Vss 
through 100 ohm resistors: 


« TCA, TCB 
« PLLDis 
e Sparel, Spare3 
Several input pins are unused during normal system operation, and should be tied to Vss: 
¢ PLLSparel, PLLSpare2, PLLSpare3, PLLSpare4 
« SelDVCO 


The following input pins may be unused in certain system configurations, and each of them 


should be tied to VecOSys, preferably. through a resistor of 100 ohms or greater value: 


° SysNMI* 


The following input pins may be unused in certain system configurations, and each of them 


should be tied to Vss, preferably, through a resistor of 100 ohms or greater value: 
° SysRdRdy* 
° SysWrRdy* 
° SysGblPerf* 
° SysCyc* 


The following input pins may be unused in certain system configurations, and each of them 


should be tied (preferably) to Vss, or VecQSys, through a resistor of 100 ohms or greater 


value: 


° SysADChk(7:0) 


4. 


Cache Organization and Coherency 


The processor implements a two-level cache structure consisting of separate primary 
instruction and data caches and a joint secondary cache. 


Each cache is two-way set associative and uses a write back protocol; that is, two cache 
blocks are assigned to each set (as shown in Figure 4-1), and a cache store writes data into 
the cache instead of writing it directly to memory. Some time later this data is 
independently written to memory. 


A write-invalidate cache coherency protocol (described later in this chapter) is supported 
through a set of cache states and external coherency requests. 
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4.1 Primary Instruction Cache 


The processor has an on-chip 32-Kbyte primary instruction cache (also referred to simply 
as the instruction cache), which is a subset of the secondary cache. Organization of the 
instruction cache is shown in Figure 4-1. 


The instruction cache has a fixed block size of 16 words and is two-way set associative with 
a least-recently-used (LRU) replacement algorithm. * 


The instruction cache is indexed with a virtual address and tagged with a physical address. 


Way0O _ 16 Kbytes Way1 16 Kbytes 
Word Data 0 Word Word Data 1 Word 
Tag 0 0 15 Tag 1 0 15 
Set { 
block 
Virtual 
Index 


Figure 4-1 Organization of Primary Instruction Cache 


Each instruction cache block is in one of the following two states: 
« Invalid 
«Valid 


+ The precise implementation of the LRU algorithm is affected by the speculative execution of 
instructions. 
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An instruction cache block can be changed from one state to the other as a result of any one 
of the following events: 


* aprimary instruction cache read miss 
e subset property enforcement 
¢ any of various CACHE instructions 


e external intervention exclusive and invalidate requests 


These events are illustrated in Figure 4-2, which shows the primary instruction cache state 
diagram. 


Subset enforcement 

CACHE Index Invalidate (I) 
CACHE Index Store Tag (I) 
CACHE Hit Invalidate (I, S) 


CACHE Index WriteBack Invalidate (S) crests ax SiroTEaI 


a: Read hit 


Intervention exclusive hit 
| Invalidate hit 


Legend: 


Internally initiated action: oo 
Externally initiated action: - —- — — 
(I) Instruction cache 
(S) Secondary cache 


Figure 4-2 Primary Instruction Cache State Diagram 
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4.2 Primary Data Cache 


The processor has an on-chip 32-Kbyte primary data cache (also referred to simply as the 
data cache), which is a subset of the secondary cache. The data cache uses a fixed block 
size of 8 words and is two-way set associative (that is, two cache blocks are assigned to each 
set, as shown in Figure 4-3) with an LRU replacement algorithm." 


Way0O 16 Kbytes Way 1 16 Kbytes 


Word Data 0 Word Word Data 1 Word 
Tag 0 0 7 Tag 1 0 7 


Figure 4-3 Organization of Primary Data Cache 


The data cache uses a write back protocol, which means a cache store writes data into the 
cache instead of writing it directly to memory. Sometime later this data is independently 
written to memory, as shown in Figure 4-4. 


Time 


Primary write back | Secondary write back Main 


Processor Cache Cache Memory 


Figure 4-4 Write Back Protocol 


Write back from the primary data cache goes to the secondary cache, and write back from 
the secondary cache goes to main memory, through the system interface. The primary data 
cache is written back to the secondary cache before the secondary cache is written back to 
the system interface. 


+ The precise implementation of the LRU algorithm is affected by the speculative execution of 
instructions. 
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The data cache is indexed with a virtual address and tagged with a physical address. Each 
primary cache block is in one of the following four states: 


¢ Invalid 
¢  CleanExclusive 
¢ = DirtyExclusive 


e Shared 


A primary data cache block is said to be Inconsistent when the data in the primary cache 
has been modified from the corresponding data in the secondary cache. The primary data 
cache is maintained as a subset of the secondary cache where the state of a block in the 
primary data cache always matches the state of the corresponding block in the secondary 
cache. 


A data cache block can be changed from one state to another as a result of any one of the 
following events: 


* primary data cache read/write miss 
¢ primary data cache write hit 

* subset enforcement 

¢ a CACHE instruction 

e external intervention shared request 
* intervention exclusive request 


e invalidate request 


These events are illustrated in Figure 4-5, which shows the primary data cache state 
diagram. 


DCache set locking relaxed (R12000) 


In R10000, when an AQ entry accesses a DCache line, that line is locked into the cache until 
the entry graduates, so that the entry will not be removed from the cache until the access 
completes. If another entry which needs to access exactly the same line arrives in the AQ 
before the first completes, the two may share the lock. In this way, a line is locked in the 
cache until all access to it complete. In order to prevent a deadlock from arising, whenever 
a cache line is locked in this way, only the oldest AQ entry can obtain a lock on the other 
“way” of the same cache set, thus ensuring that forward progress can be made. This 
algorithm can cause problems, because often the oldest entry in the AQ is the one which 
already owns the lock on the first way - thus ensuring that no other entries can access the 
second way of the cache for that set index. For some algorithms, most notably FFT’s, this 
can cause severe performance degradation. R12000 allows an entry to obtain the lock on 
the second way of a set if it is the oldest entry which does not already own a lock. Thus, any 
entries which have already acquired a lock, including those locking the first way, will not 
prevent another, younger, entry from accessing that second way. 
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Subset enforcement 

CACHE Index WriteBack Invalidate (D, S) 
CACHE Index Store Tag (D) 

CACHE Hit Invalidate (D, S) 

CACHE Hit WriteBack Invalidate (D, S) 


Read miss obtained CleanExclusive 
CACHE Index Store Tag (D) 


Clean 


i Read hit 
Exclusive 


7 


4 P ¥ af 7 
, Intervention exclusive hit 
7 Invalidate hit 7 
a 


% 
Write hit 


v7 “Intervention shared hit 
oe 
* 


Read hit Intervention shared hit 


Shared 


|) Read hit 
Write hit 


Subset enforcement 

Write miss 

Read miss obtained DirtyExclusive 
CACHE Index Store Tag (D) 


Intervention shared hit o Write hit and Upgrade ACK 


Read miss obtained Shared 
CACHE Index Store Tag (D) 


Legend: 


Internally initiated action: 
Externally initiated action: - - - - - - - 
(S) | Secondary cache 

(D) Data cache 


Figure 4-5 Primary Data Cache State Diagram 
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4.3 Secondary Cache 


The R10000 processor must have an external secondary cache, ranging in size from 512 
Kbytes to 16 Mbytes, in powers of 2, as set by the SCSize mode bit. The SCBIkSize mode 
bit selects a block size of either 16 or 32 words. 


The secondary cache is two-way set associative (that is, two cache blocks are assigned to 
each set, as shown in Figure 4-6) with an LRU replacement algorithm." 


The secondary cache uses a write back protocol, which means a cache store writes data into 
the cache instead of writing it directly to memory. Some time later this data is 
independently written to memory. 


The secondary cache is indexed with a physical address and tagged with a physical address. 


Way 0 256 Kbytes to 8 Mbytes Way 1 256 Kbytes to 8 Mbytes 
Word Data 0 Word Word Data 1 Word 
Tag 0 0 7/15 Tag 1 0 7/5 


Figure 4-6 Organization of Secondary Cache 


Each secondary cache block is in one of the following four states: 
e Invalid 
¢  ~ CleanExclusive 
¢ = DirtyExclusive 


e Shared 


+ The precise implementation of the LRU algorithm is affected by the speculative execution of 
instructions. 
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A secondary cache block can be changed from one state to another as a result of any of the 
following events: 


* primary cache read/write miss 

* primary cache write hit to a Shared or CleanExclusive block 

* secondary cache read miss 

* — secondary cache write hit to a Shared or CleanExclusive block 
e a CACHE instruction 

e external intervention shared request 

* intervention exclusive request 


e invalidate request 


These events are illustrated in Figure 4-7, which shows the secondary cache state diagram. 


CACHE Index WriteBack Invalidate (S) 
CACHE Index Store Tag (S) 

CACHE Hit Invalidate (S) 

CACHE Hit WriteBack Invalidate (S) 


Read miss obtained CleanExclusive 
CACHE Index Store Tag (S) 


Clean 
Exclusive 


Read hit 
7 


7 
7 Intervention exclusive hit 7 
y ¢ — Invalidate hit 7 


Write hit 


/ “Intervention shared hit 
é 
i” 


Read hit Intervention shared hit 


ipo) Read hit 
Write hit 


Write miss 
Read miss obtained DirtyExclusive 
CACHE Index Store Tag (S) 


Shared 


Intervention shared hit «" Write hit and Upgrade ACK 


Read miss obtained Shared 
CACHE Index Store Tag (S) 


Legend: 


Internally initiated action: om 
Externally initiated action: - - - — - - - 
(S) Secondary cache 


Figure 4-7 Secondary Cache State Diagram 
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*  <R12000> 


Pad-ring clock slowed 


The clock used to drive data to/from SC around the pad-ring has been slowed to a 2:3 clock 
divisor, thus sometimes adding an additional cycle of latency to secondary-cache accesses. 


SC refill blocking reduced 


In R10000, during the time that an SCache line is being refilled from system interface via 
the “incoming buffer (IB), no other accesses to the SCache are allowed. If the external 
interface sees an ACK to a line that is being refilled before the last words of the SCache line 
are received by R10000, this means that several cycles can elapse during which SCache 
access is blocked. By breaking the SCache refill transaction into 64-byte blocks, and 
allowing other requests to proceed during breaks between the blocks, this effect could be 
reduced. R12000 pulls in SCache lines with two “pause points.” This first occurs when 
R12000 receives the ACK for a request. If the first two quad-words are already valid in the 
Incoming Buffer at that time, then R12000 will proceed to refill the SCache with those two, 
and forward the results to the DCache or [Cache at the same time as normal. The next two 
quad-words will be refilled as they return, thus continuing to block any other access to the 
SCache just as today. If however, when the initial ACK is received, the first two are not valid 
(i.e., either 0 or 1 quad-words are valid at that time) then R12000 will “pause” the SCache 
refill and wait for both of them to be brought in to the IB. Once the first half is filled in to 
the SCache, R12000 will again check the IB to see if an additional 3 quad-words are valid 
(thus 7 out of the 8 quad-words in the SCache line should have arrived into the IB). 


Until that is the case, R12000 will again “pause” the SCache refill and allow other accesses 
to reach the SCache. These two pauses allow for other requests to slip in during an SCache 
refill. Using only two pauses both simplifies the logic and reduces bus turnarounds. 


DCache writebacks never piggyback 


In R10000 when a DCache line is written back to SCache, the following line in the DCache 
might be written back in a “piggybacked” manner. In order for this to occur the following 
line must have the same tag as the initially-written line, and must be in the “dirty 
inconsistent” state. This feature is being dropped form R12000. 


DCache writebacks never bypass 


In R10000 when a DCache line is written back to SCache, if the SCache interface is not 

otherwise occupied when the writeback begins, the writeback is bypassed directly to the 

SCache interface, avoiding the cycles required to write the data into the writeback buffer. 
This feature is being dropped form R12000. 
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4.4 Cache Algorithms 


The behavior of the processor when executing load and store instructions is determined by 
the cache algorithm specified for the accessed address. The processor supports five different 
cache algorithms: 


* uncached 

¢ cacheable noncoherent 

¢ cacheable coherent exclusive 

¢ cacheable coherent exclusive on write 


¢ —uncached accelerated 


Cache algorithms are specified in three separate places, depending upon the access: 


e the cache algorithm for the mapped address space is specified on a per-page 
basis by the 3-bit cache algorithm field in the TLB 


e the cache algorithm for the ksegO address space is specified by the 3-bit KO 
field of the CPO Config register 


* — the cache algorithm for the xkphys address space is specified by VA[61:59] 


Table 4-1 presents the encoding of the 3-bit cache algorithm field used in the TLB; 
EntryLo0 and EntryLo1 registers; CPO Config register KO field for the ksegO address space; 
and VA[61:59] for the xkphys address space. 


Table 4-1 Cache Algorithm Field Encodings 


Value Cache Algorithm 
0 Reserved 


Reserved 
Uncached 


Cacheable noncoherent 


Cacheable coherent exclusive on write 


1 
2 
3 
4 Cacheable coherent exclusive 
5 
6 Reserved 

7 


Uncached accelerated 
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Descriptions of the Cache Algorithms 


This section describes the cache algorithms listed in Table 4-1. 


Uncached 


Loads and stores under the Uncached cache algorithm bypass the primary and secondary 
caches. They are issued directly to the System interface using processor double/single/ 
partial-word read or write requests. 


Cacheable Noncoherent 


Under the Cacheable noncoherent cache algorithm, load and store secondary cache misses 
result in processor noncoherent block read requests. External agents containing caches 
need not perform a coherency check for such processor requests. 


Cacheable Coherent Exclusive 


Under the Cacheable coherent exclusive cache algorithm, load and store secondary cache 
misses result in processor coherent block read exclusive requests. Such processor requests 
indicate to external agents containing caches that a coherency check must be performed and 
that the cache block must be returned in an Exclusive state. 


Cacheable Coherent Exclusive on Write 


The Cacheable coherent exclusive on write cache algorithm is similar to the Cacheable 
coherent exclusive cache algorithm except that load secondary cache misses result in 
processor coherent block read shared requests. Such processor requests indicate to external 
agents containing caches that a coherency check must be performed and that the cache 
block may be returned in either a Shared or Exclusive state. 


Store hits to a Shared block result in a processor upgrade request. This indicates to external 
agents containing caches that the block must be invalidated. 
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Uncached Accelerated 
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The R10000 processor implements a new cache algorithm, Uncached accelerated. This 
allows the kernel to mark the TLB entries for certain regions of the physical address space, 
or certain blocks of data, as uncached while signalling to the hardware that data movement 
optimizations are permissible. This permits the hardware implementation to gather a 
number of uncached writes together, either a series of writes to the same address or 
sequential writes to all addresses in the block, into an uncached accelerated buffer and then 
issue them to the system interface as processor block write requests. The uncached 
accelerated algorithm differs from the uncached algorithm in that block write gathering is 
not performed. 


There is no difference between an uncached accelerated load and an uncached load. Only 
word or doubleword stores can take advantage of this mode. 


Stores under the Uncached accelerated cache algorithm bypass the primary and secondary 
caches. Stores to identical or sequential addresses are gathered in the uncached buffer, 
described in Chapter 6, the section titled “Uncached Buffer.” 


Completely gathered uncached accelerated blocks are issued to the System interface as 
processor block write requests. Incompletely gathered uncached accelerated blocks are 
issued to the System interface using processor double/single-word write requests; this is 
also described in Chapter 6, the section titled “Uncached Buffer.” 
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4.5 Relationship Between Cached and Uncached Operations 


Uncached and uncached accelerated load and store instructions are executed in order, and 
non-speculatively. Such accesses are buffered in the uncached buffer by the processor until 


they can be issued to the System interface. 


All uncached and uncached accelerated accesses retain program order within the uncached 


buffer. The processor continues issuing cached accesses while uncached accesses are 


queued in the uncached buffer. 


NOTE: Cached accesses do not probe the uncached buffer for conflicts. 


Buffered uncached stores prevent a SYNC instruction from graduating. However buffered 


uncached accelerated stores do not prevent a SYNC instruction from graduating. The 


processor continues issuing cached accesses speculatively and out of order beyond aSYNC 


instruction that is waiting to graduate. 


An uncached load may be used to guarantee that the uncached buffer is flushed of all 


uncached and uncached accelerated accesses. 


A SYNC instruction and the SysGblPerf* signal may be used to guarantee that all cache 
accesses and uncached stores have been globally performed as described in Chapter 6, the 


section titled “SysGblPerf* Signal.” 


An uncached load followed by a SYNC instruction may be used to guarantee that all cache 


accesses, uncached accesses, and uncached accelerated accesses have been globally 


performed. 
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4.6 Cache Algorithms and Processor Requests 


The cache algorithm determines the type of processor request generated for secondary 
cache load misses, secondary cache store misses, and store hits. 
Table 4-2 presents the relationship between the cache algorithm and processor requests. 


Table 4-2. Cache Algorithms and Processor Requests 


Cache Algorithm Load Miss Store Miss Store Hit 
Uncached Double/single/partial-word read een athal wort NA 
Cacheable noncoherent Noncoherent block read Noncoherent block read Upgrade if Shared* 
ene poberem Coherent block read exclusive Coherent block read exclusive | Upgrade if Shared* 
exclusive 
anual Foneru Coherent block read shared Coherent block read exclusive | Upgrade if Shared 
exclusive on write 
Gather identical or sequential 
double/single-word stores in the 
uncached buffer. Block write 
Uncached accelerated Double/single/partial-word read EOP ely paiered piece: NA 
Double/single-word write for 
incompletely gathered blocks. 
Partial-word write for partial- 
word stores. 


£ Should not occur under normal circumstances. Most systems return the Exclusive state for a cacheable noncoherent line; therefore, the Shared state is not 


normal. 
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4.7 Cache Block Ownership 


The processor requires cache blocks to have a single owner at all times. The owner is 
responsible for providing the current contents of the cache block to any requestor. 


The processor uses the following ownership rules: 


e The processor assumes ownership of a cache block if the state of the cache 
block becomes DirtyExclusive. For a processor block read request, the 
processor assumes ownership of the block after receiving the last doubleword 
of a DirtyExclusive external block data response and an external ACK 
completion response. For a processor upgrade request, the processor assumes 
ownership of the block after receiving an external ACK completion response. 


¢ The processor gives up ownership of a cache block if the state of the cache 
block changes to Invalid, CleanExclusive, or Shared. 


¢ = CleanExclusive and Shared cache blocks are always considered to be owned 
by memory. 
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5. 


Secondary Cache Interface 


The processor supports a mandatory secondary cache by providing an internal secondary 
cache controller with a dedicated secondary cache port. 


The cache’s tag and data arrays each consist of an external bank of industry-standard 
synchronous SRAM (SSRAM). This SSRAM must have registered inputs and outputs, 
asynchronous output enables, and use the late write protocol (data is expected one cycle 
after the address). 
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5.1 Tag and Data Arrays 


Errata 


&2 


The secondary cache consists of a 138-bit wide data array (128 data bits + 9 ECC bits + 1 
parity bit) and a 33-bit wide tag array (26 tag bits + 77 ECC bits), as shown in Figure 5-1. 
ECC is supported for both the data and tag arrays to improve data integrity. 


10 Check Bits 128 Data Bits 
istic 

Data 137 136 127 0 

Array Pt ECC ' | 
7 Check bits 26 Tag Bits 
—_—\t@€[VYY". 

Tag 32 25 0 

Array ECC : 


Figure 5-1 Secondary Cache Data and Tag Array 


The secondary cache is implemented as a two-way set associative, combined instruction/ 
data cache, which is physically addressed and physically tagged, as described in Chapter 4, 
the section titled “Cache Organization and Coherency.” 


The SCSize mode bits specify the secondary cache size; minimum secondary cache size is 
512 Kbytes and the maximum secondary cache size is 16 Mbytes, in power of 2 (512 
Kbytes, 1 Mbyte, 2 Mbytes, etc.). 


The SCBIkSize mode bit specifies the secondary cache block size. When negated, the 
block size is 16 words, and when asserted, the block size is 32 words. 
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5.2 Secondary Cache Interface Frequencies 


Errata 


The secondary cache interface operates at the frequency of SCClk, which is derived from 
PClk. The SCCIkDiv mode bits select a PCIk to SCCIk divisor of 1, 1.5, 2, 2.5, or 3, using 
the formula described in Chapter 7, the section titled “Secondary Cache Clock.” 


Synchronization between the PClk and SCCIk is performed internally and is invisible to 
the system. The processor supplies six complementary copies of the secondary cache clock 
on SCCIk(5:0) and SCCIK(5:0)*. 


The outputs and inputs at this interface are triggered by an internal SCCIk. The relationship 
between the internal SCCIk and the external SCCIk[5:0]/SCCIk[5:0]* can be 
programmed during boot time by setting the SCClkTap mode bits (see the section titled 
“Mode Bits” in Chapter 8 for detail on mode bits). 
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5.3 Secondary Cache Indexing 


Indexing the Data Array 


54 


The secondary cache data array width is one quadword, and therefore PA(3:0), which 
specify a byte within a quadword, are unused by the Secondary Cache interface. 


Since the maximum secondary cache size is 16 Mbytes (8 Mbytes per way), each way 
requires a maximum of 23 bits to index a byte within a selected way, or 19 bits to index a 
quadword within a way. Consequently, the processor supplies PA(22:4) on 
SC(A,B)Addr(18:0) to index a quadword within a way. The processor selects a secondary 
cache data way with the SC(A,B)DWay signal. 


Table 5-1 presents the secondary cache data array index for each secondary cache size; for 
instance, a 4 Mbyte cache uses the 17 address bits, PA(20:4) on SC(A,B)Addr(16:0), 
concatenated with the way bit, SC(A,B)DWay, to index a quadword within a 2 Mbyte way. 


Table 5-1 Secondary Cache Data Array Index 


1Z' . 
Mae peu Secondary Cache Data Array Index igen 
0 512 Kbyte SC(A,B)DWay || SC(A,B)Addr(13:0) PA(17:4) 
1 1 Mbyte SC(A,B)DWay || SC(A,B)Addr(14:0) PA(18:4) 
2 2 Mbyte SC(A,B)DWay || SC(A,B)Addr(15:0) PA(19:4) 
3 4 Mbyte SC(A,B)DWay || SC(A,B)Addr(16:0) PA(20:4) 
4 8 Mbyte SC(A,B)DWay || SC(A,B)Addr(17:0) PA(21:4) 
) 16 Mbyte SC(A,B)DWay || SC(A,B)Addr(18:0) PA(22:4) 


Indexing the Tag Array 


The processor supplies the secondary cache tag array’s least significant index bit on 
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SCTagLSBAddr to support two block sizes without system hardware changes. This signal 
functions normally as a least significant index bit when the secondary cache block size is 


16 words. However, when the secondary cache block size is 32 words, this signal is always 
negated, since only half as many tags are required. The processor supplies the secondary 
cache tag way on SCT Way. 


Table 5-2 presents the secondary cache tag array index for each secondary cache size; it 


shows each index is composed of a physical address loaded onto SC(A,B)Addr(Q), 


concatenated with SCT Way and SCTagLSBAddr. 


Table 5-2 Secondary Cache Tag Array Index 


For a system design that only supports a secondary cache block size of 32 words, the 


Poo Secondar 

Mode el Secondary Cache Tag Array Index 
: Cache Size 

Bits 

0 512 Kbyte SCTWay || SC(A,B)Addr(13:3) || SCTagLSBAddr 
1 1 Mbyte SCTWay || SC(A,B)Addr(14:3) || SCTagLSBAddr 
2 2 Mbyte SCTWay || SC(A,B)Addr(15:3) || SCTagLSBAddr 
3 4 Mbyte SCTWay || SC(A,B)Addr(16:3) || SCTagLSBAddr 
4 8 Mbyte SCTWay || SC(A,B)Addr(17:3) || SCTagLSBAddr 
5 16 Mbyte SCTWay || SC(A,B)Addr(18:3) || SCTagLSBAddr 


secondary cache tag array need not use SCTagLSBAddr as an index bit. 
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5.4 Secondary Cache Way Prediction Table 


The primary and secondary caches are two-way set associative. However, the 
implementation of the secondary cache is different than the primary caches. 


The primary caches read simultaneously from two separate tag arrays, corresponding to 
each way in the cache, and then select the data based on the result of two parallel tag 
compares. 


The secondary cache does not use this implementation because it would either require too 
many pins to read in two full copies of the data and tags, or add latency to externally 
multiplex two banks of memory. Instead, a way prediction table is used to determine which 
way to read from first. 


The way prediction table is internal to the processor and has 8K one-bit entries, each entry 
corresponding to a pair of secondary cache blocks. The bit entry indicates which way of the 
addressed set has been most-recently used (MRU). When the secondary cache is accessed, 
this prediction bit is used as an address bit; thus the two ways in the secondary cache are 
shared in the same SSRAM bank. 


The secondary cache way prediction table is indexed with a subset of 11 to 13 bits of the 
physical address, based on both the secondary cache block size, and the secondary cache 
size, as shown in Table 5-3. “O|” indicates a zero bit concatenated to the address to pad 
the index out to a full 13-bits. 


Table 5-3 Secondary Cache Way Prediction Table Index 
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SCSize Secondary Cache SCBIkSize Secondary Cache Secondary Cache 
Mode Bits Size Mode Bit Block Size Way Prediction Table Index 

0 16-word 0 || PAC7:6) 

0 512 Kbyte 
1 32-word 0 || 0 || PA(7:7) 
0 16-word PA(18:6) 

1 1 Mbyte 
1 32-word 0 || PA(18:7) 
0 16-word PA(18:6) 

2 to 5 2M to 16 Mbyte 

1 32-word PA(19:7) 


Errata 
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Three states are possible in the way prediction table: 
e the desired data is in the predicted way 
e the desired data is in the non-predicted way 


e the desired data is not in the secondary cache 


The tags for both ways are read “underneath” the data access cycles in order to discern as 
rapidly as possible which of these states are valid. This reading is possible because it takes 
two accesses to read a primary data block (8 words) and 4 cycles to read a primary 
instruction block (16 words); thus the bandwidth needed to read the tag array twice exists 
in all cases. Only an extra address pin to the tag array is needed to make this operation 
parallel and this is implemented by the SCT Way pin. 


The three possible states are handled in the following manner: 


e If, after reading the tags for both ways, it is discovered that the data exists in 
the predicted way, the processor continues normally. 


e If the data exists in the non-predicted way, the processor accesses this non- 
predicted way in the secondary cache and updates the way prediction table to 
point to this way. 


If the access misses in both ways of the secondary cache, the data is fetched 
from the system interface. If the state of the predicted way is found to be 
invalid, the fetched data is placed in it and the MRU is unchanged. However, if 
the state of the predicted way is found to be valid then the fetched data is placed 
into the non-predicted way, and the way prediction table is updated to point to 


this way since it is now the most-recently-used. 


The way prediction table can cover up to a 2 Mbyte secondary cache when the secondary 
cache block size is 32 words. If the secondary cache exceeds this size, the accuracy of the 
way prediction table diminishes slightly. However, the extremely large performance gain 
made by making the secondary cache larger far outstrips any performance loss in the way 
prediction table. 


Increased the Way Prediction Table (MRU table) to 16K single-bit entries 


The size of the table has been increased to 16K entries, so that 4MB caches with 128B lines 
or 2MB caches with 64B lines can be fully mapped. 


Direct Cache Test Mode 


Due to the increase size of the Way Prediction Table, Direct Cache Test Mode have been 
modified for testing the Way Prediction Table. 
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5.5 Secondary Cache Tag 


SCTag(25:4), Physical Tag 


&8 


The secondary cache tag, transferred on the SCTag(25:0) bus, is divided into three fields, 
as shown in Figure 5-2 below. 


25 43210 
Physical Tag Pldx | State | 
22 2 2 


Figure 5-2. Secondary Cache Tag Fields 


The minimum secondary cache size is 512 Kbytes (256 Kbytes per way), so a minimum of 
18 bits are required to index a data byte within a selected way. Since the processor supports 
40 physical bits, a maximum of 22 bits are required for the physical tag: 


40 physical address bits - 18 minimum required = 22 


Consequently, the processor supplies the 22 physical address bits, PA(39:18), on 
SCTag(25:4) for the physical tag. 


When the secondary cache is larger than the minimum size, the secondary cache tag array 
must still maintain the full physical tag supplied by the processor, even though some bits 
are redundant. 


SCTag(3:2), PIdx 
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Bits SCTag(3:2) of the secondary cache tag contain the primary cache index, Pldx. 


The Pldx field contains VA(13:12), which are the two lowest virtual address bits above the 
minimum 4 Kbyte page size. This field is written into the secondary cache tag during a 
secondary cache refill. For each processor-initiated secondary cache access, the virtual 
address bits are compared with the P/dx field of the secondary cache tag. If a mismatch 
occurs, a virtual coherency condition exists and the value of the Pldx field is used by 
internal control logic to purge primary cache locations, so that all primary cache blocks 
holding valid data have indices known to the secondary cache. This mechanism, unlike that 
of the R4400 processor, is implemented in hardware. It helps preserve the integrity of 
cached accesses to a physical address using different virtual addresses, an occurrence called 
virtual aliasing. For each external coherency request, the P/dx field of the secondary cache 
tag provides a mechanism to locate subset lines in the primary caches. 


SCTag(1:0), Cache Block State 


The lower two bits of the secondary cache tag, SCTag(1:0), contain the cache block state, 
which can be Invalid, Shared, CleanExclusive, or DirtyExclusive as shown in Table 5-4. 


Table 5-4 Secondary Cache Tag State Field Encoding 


SCTag(1:0) State 
0 Invalid 
1 Shared 
2 CleanExclusive 
3 DirtyExclusive 


Since the secondary cache tags are updated immediately for stores to the primary data 
cache, and all caches use a write back protocol, the data in the secondary cache may not 
always be consistent with data in the primary cache even though the tags always reflect the 
correct state of a secondary cache block. 
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5.6 Read Sequences 


There are five basic read sequences: 
« a4-word read 
e an 8-word read 
¢ a 16-word read 
¢ a32-word read 


* atag read 


Errata 


The SCCIk referred in the secondary cache read and write timing diagrams is an internal 


SCCIlk. The relationship between this internal SCCIk and the external SCCIk[5:0]/ 
SCCIk[5:0]* can be programmed during boot time by setting the SCCIkTap mode bits 
(see the section titled “Mode Bits” in Chapter 8 for detail on mode bits). 
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4-Word Read Sequence 


Cycle 

SCCIk 
SC[A,B]Addr(18:0) 
SCTagLSBAddr 
SC[A,B]DWay 
SCData(127:0) 
SCDataChk(9:0) 
SC[A,B]DOE* 
SC[A,B]DWr* 
SC[A,B]DCS* 
SCTWay 
SCTag(25:0) 
SCTagChk(6:0) 
SCTOE* 
scTwr* 
SCTCS* 
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A 4-word read sequence is performed by a CACHE Index Load Data (S) instruction to read 
a doubleword of data and 10 check bits from the secondary cache data array. 


Figure 5-3 depicts a secondary cache 4-word read sequence. A quadword is read from the 
index specified by PA(23:6), and the way specified by VA(0) of the CACHE instruction. 


The doubleword specified by VA(3) is then stored into the CPO TagHi and TagLo registers, 
and the corresponding check bits are stored into the CPO ECC(9:0) register. The data may 
be examined by copying the CPO TagHi, TagLo, and ECC registers to the general registers 
with the MTCO instruction. 


Figure 5-3 4-Word Read Sequence 
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8-Word Read Sequence 
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Cycle 

SCCIk 
SC[A,B]Addr(18:0) 
SCTagLSBAddr 
SC[A,B]DWay 
SCData(127:0) 
SCDataChk(9:0) 
SC[A,B]DOE* 
SC[A,B]DWr* 
SC[A,B]DCS* 
SCTWay 
SCTag(25:0) 
SCTagChk(6:0) 
SCTOE* 
scTwr* 
SCTCS* 
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An 8-word read sequence refills the primary data cache from the secondary cache after a 
primary data cache miss. 


Figure 5-4 depicts a secondary cache 8-word read sequence. In it, SC(A,B)DWay and 
SCTWay are driven with value X on the first address cycle, which is obtained from the way 
prediction table. 


On the next address cycle, SCTWay is complemented in order to read the tag from the non- 
predicted way of the addressed set. SC(A,B)DWay is not changed since it is assumed that 
the way prediction table is correct and the read is likely to hit in the predicted way. 


The tag for the non-predicted way is returned to the processor in the same cycle as the 
second quadword of data. Reads that miss in the predicted way, but hit in the non-predicted 
way, are noted by the internal control logic and reissued to the secondary cache as soon as 
possible. 


BNI NN NOI NI STE SS NGI NET RSIS 


Ch aa a a a a a a a 


—— oe — 
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<1 
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Figure 5-4 8-Word Read Sequence 
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16 or 32-Word Read Sequence 


Cycle 

SCCIk 
SC[A,B]Addr(18:0) 
SCTagLSBAddr 
SC[A,B]DWay 
SCData(127:0) 
SCDataChk(9:0) 
SC[A,B]DOE* 
SC[A,B]DWr* 
SC[A,B]DCS* 
SCTWay 
SCTag(25:0) 
SCTagChk(6:0) 
SCTOE* 
scTwr* 
SCTCS* 


A 16-word read sequence refills the primary instruction cache from the secondary cache 
after a primary instruction cache miss. A 16-word read sequence is also performed when 
the secondary cache block size is 16 words, and a DirtyExclusive secondary cache block 
must be written back to the System interface. 


A 32-word read sequence is performed when the secondary cache block size is 32 words, 
and a DirtyExclusive secondary cache block must be written back to the System interface. 


Figure 5-5 depicts a secondary cache 16 or 32-word read sequence. This is similar to an 8- 
word read sequence except that more addresses must be issued, in order to read the 
appropriate number of quadwords. 


| Bae SEOs Oe aN ay (OE Ce Os OW ey GO SAN se 


\ AIO K AGH \ AG KA Xx 


x x ; 

\oxXo 

: ) \Oarxg Oath ¥ NDAXIK 

Xx X X XX 

ee ee 


XX __X 


Figure 5-5 16 or 32-Word Read Sequence 
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Tag Read Sequence 
A tag read sequence is performed when the state of a secondary cache block is required, but 
it is not necessary to access the data array. This sequence is used for the CACHE Index 
Load Tag (S) instruction. 
Figure 5-6 depicts a secondary cache tag read sequence. 
Cycle 4525354 '°5'6'7'8'9'10'1'12'139' 14515 ' 16117: 
SCCIk BNE NSIS Ne ONG NE NE NEPA NEN ON SN Nol N 
sciAsiAde62) 
SCTagLSBAddr a XX 
SCData(127:0) 
SCDataChk(9:0) 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 
SSMEOE? (b= seas es Se ee et ea ede er 
SC[A,B]DWr* ; ; ; ; ; ; ; ; ; ; ; ; 
SqARCS fe ae a Ee 
SCTWay ESS OE) A SS a Se Sy Se 
i a a a ee 
SCTagChk(6:0) aS ey ene eS 0 Eee ee ee ee er ee ee a ee eee 
SCTOE* ' ' 
ne oS 
si oe i ci a ne ee 


Figure 5-6 Tag Read Sequence 
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5.7 Write Sequences 


Errata 


Chapter 5 Secondary Cache Interface 


There are five basic write sequences: 


a 4-word write. 
an 8-word write 
a 16-word write 
a 32-word write 


a tag write 


The SCCIk referred in the secondary cache read and write timing diagrams is an internal 


SCCIlk. The relationship between this internal SCCIk and the external SCCIk[5:0]/ 


SCCIk[5:0]* can be programmed during boot time by setting the SCCIkTap mode bits 


(see the section titled “Mode Bits” in Chapter 8 for detail on mode bits). 
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4-Word Write Sequence 
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A 4-word write sequence is performed by a CACHE Index Store Data (S) instruction to 
store a quadword of data and 10 check bits into the secondary cache data array. 


Figure 5-7 depicts a secondary cache 4-word write sequence. A quadword is written to the 
index specified by PA(23:6), and the way specified by VA(0) of the CACHE instruction. 


A doubleword specified by VA(3) is obtained from the CPO TagHi and TagLo registers, and 
the other half of the doubleword is padded to zeros. Normal ECC and parity generation is 
bypassed and the check field of the data array is written with the contents of the CPO 
ECC(9:0) register. 


Cycle 4 '3'4'5'6'7'8'9'10'11'12'13' 14°15 ' 16' 17) 
SCCIk / i i: f f f f f { f i: f f f f ii 
SCIA,BJAddr(18:0) |: a a a a a co eS a aE 
SCTagLSBAddr 

SC[A,B]DWay ! a ee eee 
SCData(127:0) <Dato—_< 

SCDataChk(9:0) |: <3 a ; 
SC[A,B]DOE* beh We 

SC[A,B]DWr* a 


SC[A,B]DCS* 


SCTWay 


SCTag(25:0) 


SCTagChk(6:0) 


SCTOE* 


scTwr* 
ScTcs* 
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Figure 5-7 4-Word Write Sequence 
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8-Word Write Sequence 
An 8-word write sequence writes back a dirty block from the primary data cache to the 
secondary cache. 
Figure 5-8 depicts a secondary cache 8-word write sequence. SC(A,B)DWay are driven 
with the way bit obtained from the primary data cache tag. The secondary cache tag is not 
written since it was previously updated when the primary data cache block was modified. 
Cycle 46253 '°4°5'6'7' 8B! 9'10'1' 12'°13'14'15' 16547" 
SCCIk Oe Nel Naa NGG Nad Nig NE Nea NE Nu ay RG Meds Nig EI Nek lh 
SCIA,BJAddr(18:0) |" XAgrOX AGE 
SCTagLSBAddr : : : i ; 1 i i i ; i i ; 1 i i i 7 
SCIAB]DWay 
SCData(127:0) ' < Dat X Dat) ——< 
SCDataChk(9:0) ‘|: ‘a Gat 2 a i i ; ; 
Scie i a 
SC[A,B]DWr* ; ; ; ; ; 
SCI BISCO: Sipe 
SCTWay 


SCTag(25:0) 
SCTagChk(6:0) 
SCTOE* 
scTwr* 
SCTCS* 


Figure 5-8 8-Word Write Sequence 
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16 or 32-Word Write Sequence 


A 16- or 32-word write sequence refills a secondary cache block from the System interface 
after a secondary cache miss. A 16-word write sequence is performed when the secondary 
cache block size is 16 words, and a 32-word write sequence is performed when the 
secondary cache block size is 32 words. 


Figure 5-9 depicts a secondary cache 16 or 32-word write sequence. 


Cycle 2'3'°4'5'6:7: 


1 
SCCIk I\S\S\ SVN 
SC[A,B]Addr(18:0) WX RarOX RSET XEN 
SCTagLSBAddr : : x x : : i i 4 : i ; 4 i i i i 
LS  — ———— ———————————— 
SCData(127:0) >) dai\ Dai oan Dain—_< 
SCDataChk(9:0) | <__ x r yt ; ; 
SC[A,B]DOE* — ps Sah a he = 
SC[A,B]DWr* — ; ; ; ; 
SC[A,B]DCS* fe Ae pe hs oe hy Se 
sCTWay ; i x xX xX 

< 


SCTag(25:0) 

SCTagChk(6:0) ‘ : 
SCTOE* fae fs = a 
scTwr* Reed 
scTcs* 


Tag: 


Figure 5-9 16/32-Word Write Sequence 
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to reflect primary cache state changes in the secondary cache 


Chapter 5 Secondary Cache Interface 
for the CACHE Index Store Tag (S) instruction 


for external coherency requests 


A tag write sequence updates the secondary cache tag array without affecting the data array. 


This sequence is used for the following: 
Figure 5-10 depicts the secondary cache tag write protocol. 


Tag Write Sequence 
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Figure 5-10 Tag Write Sequence 
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6. 


System Interface Operations 


The R10000 System interface provides a gateway between processor, with its associated 
secondary cache, and the remainder of the computer system. 


For convenience, any device communicating with the processor through the System 
interface is referred to as the external agent. 
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6.1 Request and Response Cycles 


The System interface supports the following request and response cycles: 


Processor requests are generated by the processor, when it requires a system 
resource. 


External responses are supplied by an external agent in response to a 
processor request. 


External requests are generated by an external agent when it requires a 
resource within the processor. 


Processor responses are supplied by the processor in response to an external 
request. 


6.2 System Interface Frequencies 


The System interface operates at SysClk frequency, supplied by the external agent. The 
internal processor clock, PClk, is derived from this same SysClk. 


The SysClkDiv mode bits select a PCIk to SysClk divisor of 1, 1.5, 2, 2.5, 3, 3.5, or 4, 
using the formula described in Chapter 7, the section titled “System Interface Clock and 
Internal Processor Clock Domains.” 


6.3 Register-to-Register Operation 


The System interface is designed to operate in the following register-to-register fashion 
with the external agent: 


all System interface outputs are sourced directly from registers clocked on the 
rising edge of SysClk 


all System interface inputs directly feed registers that are clocked on the rising 
edge of SysClk 


This allows the System interface to run at the highest possible clock frequency. 
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6.4 System Interface Signals 


The R10000 System interface is composed of: 
¢ 3 arbitration signals 
¢ 2 flow-control input signals 
¢ a bidirectional 12-bit command bus 
¢ a bidirectional 64-bit multiplexed address/data bus 
¢ a3-bit state output bus 


¢ a5-bit response input bus 


6.5 Master and Slave States 


At any time, the System interface is either in master or slave state. 


In master state, the processor drives the bidirectional System interface signals and is 
permitted to issue processor requests to the external agent. 


In slave state, the processor tristates the bidirectional System interface signals and accepts 
external requests from the external agent. 


6.6 Connecting to an External Agent 


In a uni- or multiprocessor system using dedicated external agents, the System interface 
connects to a single external agent. 


In a multiprocessor system using the cluster bus (see below), the system can connect up to 
four R10000 processors to an external agent. This external agent is referred to as the 
cluster coordinator. 


103 


6.7 Cluster Bus 
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R10000 


System Interface 
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In a multiprocessor system using the cluster bus, the cluster coordinator performs the 
cluster bus arbitration and data flow management. The arbitration scheme assures that 
either one of the processors or the cluster coordinator is master at any given time, while the 
remaining devices are slave. 


A processor request issued by the master processor is observed as an external request by all 
slave R10000 processors, as shown in Figure 6-1. Similarly, a processor coherency data 
response issued by a master processor is observed as an external data response by the slave 
processors. 


(Slave) 
R10000 


System Interface 


(Slave) 
R10000 


System Interface 


(Slave) 
R10000 


System Interface 


External Request 


Cluster Bus 


Cluster 


Coordinator 


Figure 6-1 Processor Request Master/Slave Status 


In a multiprocessor system using the cluster bus, a mode bit specifies whether processor 
coherent requests are to target the external agent only, or all processors and the external 
agent. This allows systems with efficient snoopy, duplicate tag, or directory-based 
coherency protocols to be created. 
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6.8 System Interface Connections 


The major System interface connections required for various system configurations are 
presented in this section. 


Uniprocessor System 


Figure 6-2 shows the major System interface connections required for a typical 
uniprocessor system. 


SysReq* SysReq" scTwr 
External SysGnt ac SCTCS: 
Agent ysRel* ysRel* R1 0000 TOE* 
. SCTag(25:0) 
SysRdRdy* SysRdRdy SCTagChk(6:0) 
SysWrRdy* SysWrRdy* 
SCTWay 
SCTagLSBAddr 
eye 1:0 Rial 1:0) 
ysCmdPar ysCmdPar 
SysAD(63:0 SysAD(63:0) | SC(A,B)Addr(18:0) 
Mem, 0 He SysADChk(7:0) SysADChk(7:0) 
SysVal* SysVal* i 
Addr 
SysState(2:0 SysState(2:0) SC(A,B)DWay 
SysStatePar SysStatePar o 
SysStateVal* SysStateVal* SCData(127:0) Data HU 
SCDataChk(9:0) Ee = 
SysResp(4:0 SysResp(4:0) SC(A,B)DWr* wr = 
SysRespPar SysRespPar SC(A,B)DCS* CS* 
SysRespVal* SysRespVal* SC(A,B)DOE* OE 
Figure 6-2 System Interface Connections for Uniprocessor System 
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Multiprocessor System Using Dedicated External Agents 
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Coherent Interconnect 


Figure 6-3 shows the major System interface connections required for a typical 


multiprocessor system using dedicated external agents. 


SysReq* 
External syscni* 
Agent SysRel* 


SysRdRdy* 
SysWrRdy* 


SysCmd(11:0 
SysCmdPar 
SysAD(63:0. 
SysADChk(7:0) 
SysVal* 


SysState(2:0 
SysStatePar 
SysStateVal* 


SysResp(4:0 
SysRespPa 
SysRespVa 
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SysReq* scTwr* 
SysGnt* SCTCS* 
SysRel* SCTOE* 
R1 0000 SCTag(25:0) 
SysRdRdy* SCTagChk(6:0) 
SysWrRdy* 
SCTWay 
SCTagLSBAddr 
SysCmd(1 1:0) 
SysCmdPar 


SysAD(63:0) | SC(A,B)Addr(18:0) 
SysADChk(7:0) 


SysVal* 
SysState(2:0) SC(A,B)DW 
SysStatePar AB) ay 
SysStateVal* SCData(127:0) 
SCDataChk(9:0) 
SysResp(4:0) SC(A,B)DWr* 
SysRespPar SC(A,B)DCS* 
SysRespVal* SC(A,B)DOE* 


SNVYSS 
bey 


SysReq* 
External syscni: 
Agent SysRel* 


SysRdRdy* 
SysWrRdy* 


SysCmd(11:0 
SysCmdPar 
SysAD(63:0. 
SysADChk(7:0) 
SysVal* 


SysState(2:0 
SysStatePar 
SysStateVal* 


SysResp(4:0 
SysRespPa 
SysRespVa 
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| SysState(2:0) 
| SysStatePar 


| SysReq* 

SysGnt* SCTCS* 
SysRel* R10000 SCTOE* 
SCTag(25:0) 
SysRdRdy* SCTagChk(6:0) 

SysWrRdy* 
SCTWay 
SCTagLSBAddr 

SysCmd(1 1:0) 
SysCmdPar 


SysAD(63:0) | SC(A,B)Addr(18:0) 
SysADChk(7:0) 
SysVal* 


SC(A,B)DWay 


} SysStateVal* SCData(127:0) 
SCDataChk(9:0) 

SysResp(4:0) SC(A,B)DWr* 
SysRespPar SC(A,B)DCS* 
SysRespVal* SC(A,B)DOE* 


SNVYSS 
bey 


SIAVHSS 
eleqg 


Figure 6-3 System Interface Connections for Multiprocessor using Dedicated External Agents 
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Multiprocessor System Using the Cluster Bus 


Figure 6-4 presents the major System interface connections required for a typical 
multiprocessor system using the cluster bus. 


SysRel* 


SysVal* 


SysRdRdy* 7) 
SysWrRdy* = 

faa) 

SysCmd(11:0) = 
SysCmdPar oO 
SysAD(63:0) D 
SysADChk(7:0) s 
Oo 


SysResp(4:0) 
SysRespPar 
SysRespVal* 


SysReq0* 
SysGnt0* 


Cluster 
Coordinator 


SysState0(2:0) 
SysStatePar0 
SysStateVal0* 


SysReq1* 
SysGnt1* 


SysState1 (2:0) 
SysStatePar1 
SysStateVal1* 


SysReq" SCTWrt 


SysGnt* SCTCS* 
SysRel* SCTOE* 
R10000 SCTag(25:0) 
SysRdRdy* SCTagChk(6:0) 
SysWrRdy* 
SCTWay 
SCTagLSBAddr 
em 1:0) 
ysCmdPar 
SysAD(63:0) | SC(A,B)Addr(18:0) 


SysADChk(7:0) 
SysVal* 


SysState(2:0) 
SysStatePar 
SysStateVal* 


SC(A,B)DWay 
SCData(127:0) 


n 
38 
SCDataChk(9:0) S = 
SysResp(4:0) SC(A,B)DWr* wr 7) 
SysRespPar SC(A,B)DCS* 
SysRespVal* SC(A,B)DOE* 


SysReq* 


SysGnt* SCTCS* 
SysRel* SCTOE* n 
2) 
R1 0000 SCTag(25:0) Bory 
SysRdRdy SCTagChk(6:0) ie 
SysWrRdy* 
SCTWay 
SCTagLSBAddr 
Syecnat 1:0) 
ysCmdPar 
SysAD(63:0) | SC(A,B)Addr(18:0) 


SysADChk(7:0) 
SysVal* 


SysState(2:0) 
SysStatePar 
SysStateVal* 


SC(A,B)DWay 


SCData(127:0) 
SCDataChk(9:0) 


SysResp(4:0) SC(A,B)DWr* 
SysRespPar SC(A,B)DCS* 
SysRespVal* SC(A,B)DOE* 


Figure 6-4 System Interface Connections for Multiprocessor Using the Cluster Bus 
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6.9 System Interface Requests and Responses 


The System interface supports the following: 
* processor request 
* external response 
* external request 


: processor response 


The following sections describe these request and response types, and their operations. 


Processor Requests 
Processor requests are generated by the processor when it requires a system resource. The 
following processor requests are supported: 
¢ coherent block read shared request 
¢ coherent block read exclusive request 
¢ — noncoherent block read request 
¢  double/single/partial-word read request 
¢ block write request 
¢  double/single/partial-word write request 
* upgrade request 


e eliminate request 
Processor write and eliminate requests do not require or expect a response by the external 
agent. However, if an external agent detects an error in a processor write or eliminate 
request, it may use an interrupt to signal the processor. It is not possible to generate precise 


exceptions for processor write and eliminate requests for which an external agent detects 
an error. 


Processor read and upgrade requests require some type of response by the external agent. 
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External Responses 


External Requests 


Processor Responses 
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External responses are supplied by an external agent or another processor in response to a 
processor request. The following external responses are supported: 


e block data response 
¢  double/single/partial-word data response 


* completion response 


External requests are generated by an external agent when it requires a resource within the 
processor. The following external requests are supported: 


* intervention shared request 

¢ intervention exclusive request 

e allocate request number request 
e invalidate request 


e interrupt request 


External intervention and invalidate requests require some type of response by the 
processor. 


Processor responses are supplied by the processor in response to an external request. The 
following processor responses are supported: 


. coherency state response 


¢ coherency data response 


Outstanding Requests and Request Numbers 


The processor allows requests and corresponding responses to be split transactions, which 
enables additional processor and external requests to be issued while waiting for a prior 
response. The System interface supports a request number field to link requests with their 
corresponding responses, so responses can be returned out of order. 


The processor allows a maximum of eight outstanding requests on the System interface 
through a 3-bit request number. These outstanding requests may be composed of any mix 
of processor and external requests. 


An individual processor (as opposed to the System interface, above) supports a maximum 
of four outstanding processor requests at any given time. 
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Request and Response Relationship 


The relationship between processor and external requests, and their acceptable responses, 
is presented in Table 6-1. The data in this table is given with respect to a single processor, 
in either a uni- or multiprocessor system (independent of cluster/non-cluster configuration). 


Table 6-1 Request and Response Relationship 


Request 


Acceptable Response Sequences 


Processor block read request 


External NACK or ERR completion response 


0 or more external block data responses followed by a final external block data 
response with a coincidental or subsequent external ACK, NACK, or ERR 
completion response 


Processor double/single/partial-word 
read request 


Processor block write request 


External NACK or ERR completion response 


0 or more external double/single/partial-word data responses followed by a final 
external double/single/partial-word data response with a coincidental or 
subsequent external ACK, NACK, or ERR completion response 


None 


Processor double/single/partial-word 
write request 


None 


Processor upgrade request 


External ACK, NACK, or ERR completion response 


0 or more external block data responses followed by a final external block data 
response with a coincidental or subsequent external ACK, NACK, or ERR 
completion response 


Processor eliminate request 


None 


External intervention request 


External allocate request number request 


Processor coherency state response followed by processor coherency data 
response (if DirtyExclusive) with a coincidental or subsequent external ACK, 
NACK, or ERR completion response* 


External ACK, NACK, or ERR completion response* 


External invalidate request 


Processor coherency state response followed by external ACK, NACK, or ERR 
completion response* 


External interrupt request 


None 


£ External completion response is required to free the request number. 
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6.10 System Interface Buffers 


Cluster Request Buffer 


Cached Request Buffer 


The processor contains the following five buffers to enhance the performance of the System 
interface and to simplify the system design: 


e cluster request buffer 
e cached request buffer 
* incoming buffer 
* outgoing buffer 


¢ —uncached buffer 


These buffers are described in the following sections. 


The System interface contains an 8-entry cluster request buffer. This buffer maintains the 
status of the eight possible outstanding requests on the System interface. When the System 
interface is in master state, and it issues the address cycle of processor read or upgrade 
request, the processor places an entry into the cluster request buffer. When the System 
interface is in slave state, and an external agent issues an external coherency or allocate 
request number request, it places an entry into the cluster request buffer. 


Once an entry is placed into the cluster request buffer, the associated request number 
transitions from free to busy. An entry remains busy until the processor receives an external 
completion response. Processor requests that are ready to be issued to the System interface 
bus probe the cluster request buffer to detect conflict conditions. 


The System interface contains a four-entry cached request buffer. This buffer holds the 
status of the four possible outstanding processor cached requests, including processor block 
read and upgrade requests. The relative order of the requests is maintained in the cached 
request buffer. 


External coherency requests probe the cached request buffer to detect conflict conditions. 


lil 


Incoming Buffer 
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The System interface contains an incoming buffer for external block and double/single/ 
partial-word data responses. The four 32-word entries of the incoming buffer correspond to 
the four possible outstanding processor requests. Block data in each entry of the incoming 
buffer is stored in subblock order, beginning with a quadword-aligned address. 


The incoming buffer eliminates the need for the processor to flow-control the external agent 
that is providing the external data responses. Regardless of the cache bandwidth or internal 
resource availability, the external agent may supply external data response data for all 
outstanding read and upgrade requests at the maximum System interface data rate. 


The external agent may issue any number of external data responses for a particular request 
number before issuing a corresponding external completion response. An external data 
response remains in the incoming buffer until a corresponding external completion 
response is received. A former buffered external data response for a particular request 
number is over-written by a subsequent external data response for the same request number. 


Anexternal ACK completion response frees buffered data to be forwarded to the caches and 
other internal resources while an external NACK or ERR completion response purges any 
corresponding buffered data. For minimum latency, the external agent should issue an 
external ACK completion response coincident with the first doubleword of an external data 
response. 


External coherency requests that target blocks residing in the incoming buffer are stalled 
until the incoming buffer data is forwarded to the secondary cache, and the instruction that 
caused the secondary miss is satisfied. 


Each doubleword of the incoming buffer has an Uncorrectable Error flag. When an external 
data response provides a doubleword, the processor asserts the corresponding incoming 
buffer Uncorrectable Error flag if the data quality indicator, SysCmd[5], is asserted, or if 
an uncorrectable ECC error is encountered on the system address/data bus and the ECC 
check indication on SysCmd[0] is asserted. 


When the processor forwards block data from an incoming buffer entry after receiving an 
external ACK completion response, the associated incoming buffer Uncorrectable Error 
flags are checked, and if any are asserted, a single Cache Error exception is posted. When 
the processor forwards double/single/partial-word data from an incoming buffer entry after 
receiving an external ACK completion response, the associated incoming buffer 
Uncorrectable Error flag is checked, and if asserted, a Bus Error exception is posted. 


Outgoing Buffer 
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The System interface contains a five-entry outgoing buffer to provide buffering for the 
following: 


¢ DirtyExclusive blocks that are cast out of the secondary cache because of a 
block replacement 


e — various CACHE instructions 


¢ an external intervention request. 


Four 32-word typical entries are associated with the four possible outstanding processor 
cached requests allowed by the processor. One 32-word special entry is reserved for 
external intervention requests only. The data is stored in each entry of the outgoing buffer 
in sequential order, beginning with a secondary cache block-aligned address. 


An instruction or data access that misses in the secondary cache but targets an entry in the 
outgoing buffer is stalled until the outgoing buffer entry is issued as a processor block write 
request or coherency data response to the System interface bus. 


External coherency requests probe the four typical outgoing buffer entries, with the 
following results: 


e If an external intervention request hits a typical entry, that entry is converted 
from a processor block write request to a processor coherency data response. 


¢ If an external invalidate request hits a typical outgoing buffer entry, that entry 
is deleted. 


e If an external intervention request does not hit a typical outgoing buffer entry, 
but hits a DirtyExclusive block in the secondary cache, the special outgoing 
buffer entry is used to buffer the processor coherency data response. 


A typical outgoing buffer entry containing a block write is ready for issue to the System 
interface bus when the first quadword is received from the secondary cache. The processor 
allows data to stream from the secondary cache to the System interface bus through the 
outgoing buffer. 


An outgoing buffer entry containing a coherency data response is ready for issue to the 
System interface bus when the quadword specified by the corresponding external 
intervention request is received from the secondary cache. The processor then allows the 
data to stream from the secondary cache to the System interface bus through the outgoing 
buffer. 
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Uncached Buffer 
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Each quadword of the outgoing buffer maintains an Uncorrectable Error flag. If an 
uncorrectable error is encountered while a block is being cast out of the secondary cache, 
the associated outgoing buffer quadword Uncorrectable Error flag is asserted. When the 
processor empties an outgoing buffer entry by issuing a processor block write or coherency 
data response, the outgoing buffer Uncorrectable Error flags are reflected by the data 
quality indication on SysCmd[5]. 


The System interface contains an uncached buffer to provide buffering for uncached and 
uncached accelerated load and store operations. All operations retain program order within 
the uncached buffer. 


The uncached buffer is organized as a 4-entry FIFO followed by a 2-entry gatherer. Each 
gathered entry has a capacity of 16 or 32 words, as specified by the SCBIkSize mode bit. 


The uncached buffer begins gathering when an uncached accelerated double or singleword 
block-aligned store is executed. Gathering continues if the subsequent uncached operation 
executed is an uncached accelerated double or singleword store to a sequential or identical 
address. Once a second uncached accelerated store is gathered, the gathering mode is 
determined to be sequential or identical. Gathering continues until one of the following 
conditions occurs: 


« acomplete block is gathered 

e an uncached or uncached accelerated load is executed 

e an uncached or uncached accelerated partial-word store is executed 
e an uncached store is executed 

¢ achange in the current gathering mode is observed 


e achange in the uncached attribute is observed 


When gathering terminates, the data is ready for issue to the System interface bus. A 
processor uncached accelerated block write request is used to issue a completely gathered 
uncached accelerated block. One or more disjoint processor uncached accelerated double 
or singleword write requests are used to issue an incompletely gathered uncached 
accelerated block. 


When gathering in an identical mode, uncached accelerated double or singleword stores 
may be freely mixed. The uncached buffer packs the associated data into the gatherer. 
When gathering in sequential mode, uncached accelerated singleword stores must occur in 
pairs, to prevent an address error exception. For instance, SW, SW, SD, SW, SW is legal. 
SD, SW, SD, is not. 


External coherency requests have no effect on the uncached buffer. 


CACHE instructions have no effect on the uncached buffer. SYNC instructions are 
prevented from graduating if an uncached store resides in the uncached buffer. 
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6.11 System Interface Flow Control 


The System interface supports a maximum request rate of one request per SysClk cycle, 
and a maximum data rate of one doubleword per SysClk cycle. 


Various flow control mechanisms are provided to limit these rates, as described below. 


Processor Write and Eliminate Request Flow Control 


The processor can only issue a processor write or eliminate request if: 
¢ the System interface is in master state 


¢ SysWrRdy* was asserted two SysClk cycles previously 


Processor Read and Upgrade Request Flow Control 


The processor can only issue a processor read or upgrade request if: 
¢ the System interface is in master state 
¢ SysRdRdy* was asserted two SysClk cycles previously 


e the maximum number of outstanding processor requests specified by the 
PrcReqMax mode bits is not exceeded 


e there is a free request number 


Processor Coherency Data Response Flow Control 


The processor can only issue a processor coherency data response if: 
* the System interface is in master state 


¢ SysWrRdy* was asserted two SysClk cycles previously 


External Request Flow Control 


When the System interface is in Slave state, it is capable of accepting external requests. An 
external agent may issue external requests in adjacent SysClk cycles. 


External Data Response Flow Control 


Since the processor has an incoming buffer, an external agent may supply external data 
response data in adjacent SysClk cycles, without regard to cache bandwidth or internal 
resource availability. 
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6.12 System Interface Block Data Ordering 


During block data transfers on the System interface SysAD[63:0] bus, even doublewords 
(Dat0, Dat?2,...) always correspond to SCData[127:64], and odd doublewords (Dat1, 
Dat3,...) always correspond to SCData[63:0]. 


External Block Data Responses 


During the address cycle of processor block read and upgrade requests, the processor 
specifies a quadword-aligned address. The processor expects the external block data 
response to be supplied in a subblock order sequence, beginning at the specified quadword- 
aligned address. 


Processor Coherency Data Responses 


The address of external intervention requests are internally aligned by the processor to a 
quadword address. If the processor determines that it must issue a processor coherency data 
response, it supplies the data in a subblock order sequence beginning at the quadword- 
aligned address specified by the corresponding external coherency request. 


Processor Block Write Requests 
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During the address cycle of processor block write requests, the processor specifies a cache 
block-aligned address. During the subsequent data cycles for typical processor block write 
requests, the processor supplies the data in sequence, beginning with the secondary cache 
block-aligned address. 
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6.13 System Interface Bus Encoding 
This section presents the encoding of the following four System interface buses: 
¢ SysCmd[11:0] 
¢  SysAD[63:0] 
¢ —SysState[2:0] 
e¢  SysResp[4:0] 


SysCmd[11:0] Encoding 
This section describes address and data cycle encodings for the system command bus, 
SysCmd[11:0]. 
SysCmd[11] Encoding 


When SysVal* is asserted, SysCmd[11] indicates whether the SysAD[63:0] bus represents 
an address or a data cycle, as shown in Table 6-2. 


Table 6-2. Encoding of SysCmd[11] 


SysCmd[11] Data/Address Cycle Indication 
0 SysAD[63:0] address cycle 
1 SysAD[63:0] data cycle 


SysCmd[10:0] Address Cycle Encoding 


During the address cycle of processor read and upgrade requests, SysCmd[10:8] contain 
the request number, as shown in Table 6-3. The request number provides a mechanism to 
associate an external response with the corresponding processor request. 


Table 6-3 Encoding of SysCmd[10:8] for Processor Read and Upgrade Requests 


SysCmd[10:8] Request Number | 
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During the address cycle of processor requests, SysCmd[7:5] contain the command, as 
shown in Table 6-4. 


Table 6-4 Encoding of SysCmd[7:5] for Processor Requests 


SysCmd[7:5] Command 
0 Coherent block read shared 
1 Coherent block read exclusive 
2 Noncoherent block read 
3 Double/single/partial-word read 
4 Block write 
5 Double/single/partial-word write 
6 Upgrade 
7 Special 


During the address cycle of processor read requests, SysCmd][4:3] contain the read cause 
indication, as shown in Table 6-5. This information is useful in handling the associated 
external response. 


Table 6-5 Encoding of SysCmd[4:3] for Processor Read Requests 


SysCmd[4:3] Read Cause Indication 


0 Instruction access 


1 Data typical access 
2 Data LL/LLD access 
3 


Data prefetch access 


During the address cycle of processor write requests, SysCmd[4:3] contain the write cause 
indication, as shown in Table 6-6. This information is useful in handling the associated 
write data. 


Table 6-6 Encoding of SysCmd[4:3] for Processor Write Requests 


SysCmd[4:3] Write Cause Indication 
0 Reserved 


Data typical access 


1 
2 Data uncached accelerated sequential access 
3 


Data uncached accelerated identical access 


Chapter 6 System Interface Operations 


During the address cycle of processor upgrade requests, SysCmd[4:3] contain the upgrade 
cause indication, as shown in Table 6-7. This information useful in handling the associated 
external response. 


Table 6-7 Encoding of SysCmd[4:3] for Processor Upgrade Requests 


SysCmd[4:3] Upgrade Cause Indication 
0 Reserved 


1 Data typical access 
2 Data SC/SCD access 
3 


Data prefetch access 


During the address cycle of processor special requests, SysCmd[4:3] contain the processor 
special cause indication, as shown in Table 6-8. This information differentiates between 
the various processor special requests. 


Table 6-8 Encoding of SysCmd[4:3] for Processor Special Requests 


SysCmd[4:3] Special Cause Indication 
0 Reserved 
1 Eliminate 
2 Reserved 
3 Reserved 


During the address cycle of processor block read, typical block write, upgrade, and 
eliminate requests, SysCmd[2:1] contain the secondary cache block former state, as shown 
in Table 6-9. This information may be useful for system designs implementing a duplicate 
tag or a directory-based coherency protocol. 


Table 6-9 Encoding of SysCmd[2:1] for Processor Block Read/Write, 
Upgrade, Eliminate Requests 


SysCmd[2:1] Secondary Cache Block Former State 
0 Invalid 
1 Shared 
2 CleanExclusive 
3 DirtyExclusive 
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During the address cycle of processor double/single/partial-word read and write requests, 
SysCmd[2:0] contain the data size indication, as shown in Table 6-10. 


Table 6-10 Encoding of SysCmd[2:0] for Processor Double/Single/Partial-Word Read/ 
Write Requests 


SysCmd[2:0] Data Size Indication 
0 One byte valid (Byte) 
1 Two bytes valid (Halfword) 
2 Three bytes valid (Tribyte) 
3 Four bytes valid (Word) 
4 Five bytes valid (Quintibyte) 
5 Six bytes valid (Sextibyte) 
6 Seven bytes valid (Septibyte) 
7 Eight bytes valid (Doubleword) 


During the address cycle of external intervention and invalidate requests, SysCmd[10:8] 
contain the request number, as shown in Table 6-11. The request number provides a 
mechanism to associate a potential processor coherency data response with the 
corresponding external coherency request. 


Table 6-11 Encoding of SysCmd[10:8] for External Intervention 
and Invalidate Requests 


SysCmd[10:8] Request Number | 


During the address cycle of external requests, SysCmd[7:5] contain the command, as 
shown in Table 6-12. 


Table 6-12 Encoding of SysCmd[7:5] for External Requests 


SysCmd[7:5] Command 
0 Intervention shared 
1 Intervention exclusive 
2 Allocate request number 
3 Allocate request number 
4 NOP 
2 NOP 
6 Invalidate 
7 Special 


Errata 
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During the address cycle of external special requests, SysCmd[4:3] contain the external 
special cause indication, as shown in Table 6-13. This information is used to differentiate 
between the various external special requests. 


Table 6-13 Encoding of SysCmd[4:3] for External Special Requests 


SysCmd[4:3] Special Cause Indication 
0 Reserved 
1 NOP 
2 Interrupt 
3 Reserved 


During external address cycles, SysCmd][0] specifies whether ECC checking and 
correcting is to be performed for the SysAD[63:0] bus, as shown in Table 6-14. During the 
address cycle of processor block read, data typical block write, upgrade, and eliminate 


requests, the processor asserts SysCmd[0]. Consequently, in a multiprocessor system using 
the cluster bus, ECC checking and correcting is enabled for external coherency requests 
resulting from processor coherent block read and upgrade requests. 


Table 6-14 Encoding of SysCmd[0] for External Address Cycles 


SysCmd[0] ECC check indication 
0 ECC checking and correcting disable 
1 ECC checking and correcting enable 


SysCmd[10:0] Data Cycle Encoding 


During the data cycles of an external data response or a processor coherency data response, 
SysCmd[10:8] contain the request number associated with the original request, as shown 
in Table 6-15. 


Table 6-15 Encoding of SysCmd[10:8] for Data Responses 


SysCmd[10:8] Request Number 


During data cycles, SysCmd[5] indicates the data quality, as shown in Table 6-16. 


Table 6-16 Encoding of SysCmd[5] for Data Cycles 


SysCmd[5] Data quality indication 
0 Data is good 
1 Data is bad 
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During data cycles, SysCmd[4:3] indicate the data type, as shown in Table 6-17. Processor 
block write and double/single/partial-word write requests use request data and request last 
data type indications. External data and processor coherency data responses use response 
data and response last data type indications. 


Table 6-17 Encoding of SysCmd[4:3] for Data Cycles 


SysCmd[4:3] Data type Indication 
0 Request data 
1 Response data 
2 Request last 
3 Response last 


During data cycles of an external block data response or processor coherency data response, 
SysCmd[2:1] contain the state of the cache block, as shown in Table 6-18. 


Table 6-18 Encoding of SysCmd[2:1] for Block Data Responses 


SysCmd[2:1] Cache Block State 
0 Reserved 
1 Shared 
2 CleanExclusive 
3 DirtyExclusive 


During data cycles, SysCmd[0] specifies whether ECC checking and correcting is to be 
performed for the SysAD[63:0] bus, as shown in Table 6-19. During processor data cycles, 
the processor asserts SysCmd[0]. Consequently, in a multiprocessor system using the 
cluster bus, ECC checking and correcting will be enabled for external block data responses 
resulting from processor coherency data responses. 


Table 6-19 Encoding of SysCmd[0] for External Data Cycles 


SysCmd[0] ECC check indication 
0 ECC checking and correcting disable 
1 ECC checking and correcting enable 


SysCmd[11:0] Map 
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Table 6-20 presents a map for the SysCmd[11:0] bus. 


Command 


Table 6-20 SysCmd[11:0] Map 


SysCmd[11:0] Bit 


5 4 


Processor 
address 
cycles 


Coherent block read shared 


Coherent block read exclusive 


Noncoherent block read 


Double/single/partial-word read 


Request number 


Read cause 


Block state 


Data size 


Block write 


Double/single/partial-word write 


Write cause 


Block state 


Data size 


Upgrade 


Request number 


Upgrade cause 


Block state 


Reserved 


Eliminate 


Reserved 


0 


Special 


Reserved 


Reserved 


Reserved 


Block state 


Reserved 


Processor 


data cycles 


Double/single/partial-word write 


Block write 


Coherency data response 


Request number 


0 


Data Data type 
quality 


Block state 


External 
address 
cycles 


Intervention shared 


Intervention exclusive 


Allocate request number 


Request number 


0 


NOP 


Invalidate 


x 


Request number 


NOP 


Special 
Interrupt 


NOP 


External 
data cycles 


Block data response 


Double/single/partial-word data 
response 


Request number 


Data 


quality Data type 


Block state 


x 
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SysAD[63:0] Encoding 
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This section describes the system address/data bus encoding. 


SysAD[63:0] Address Cycle Encoding 


SysAD[63:60] 


Table 6-21 presents the encoding of the SysAD[63:0] bus for address cycles. 


Table 6-21 Encoding of SysAD[63:0] for Address Cycles 


SysAD[63:60] Target Indication 
SysAD[63] Target processor with DevNum = 3 


SysAD[62] Target processor with DevNum = 2 


SysAD[61] Target processor with DevNum = | 


SysAD[60] Target processor with DevNum = 0 
SysAD[59:58] Uncached attribute 
SysAD[57] Secondary cache block way indication 
SysAD[56:40] Reserved 
SysAD[39:0] Physical address 


During the address cycle of processor noncoherent block read, double/single/partial-word 
read, block write, double/single/partial-word write, and eliminate requests, the processor 
always drives a target indication of 0 on SysAD[63:60]. This indicates that the request 
targets the external agent only. When the CohPreReqTar mode bit is negated, during the 
address cycle of processor coherent block read and upgrade requests, the processor also 
drives a target indication of 0 on SysAD[63:60]. However, when the CohPrcReqTar mode 
bit is asserted, during the address cycle of processor coherent block read and upgrade 
requests, the processor drives a target indication of OxF on SysAD[63:60]. This indicates 
that the request targets all processors, together with the external agent, on the cluster bus. 
In multiprocessor systems using the cluster bus, the CohPrcReqTar mode bit is asserted 
for a snoopy-based coherency protocol, and negated for a duplicate tag or directory-based 
coherency protocol. 


When the processor is in slave state, an external agent uses the target indication field to 
specify which processors are targets of an external request. 


SysAD[59:58] Uncached Attribute 


During the address cycle of processor double/single/partial-word read and write requests 
and during the address cycle of processor Uncached accelerated block write requests, the 
processor drives the uncached attribute onto SysAD[59:58]. See the section titled, Support 
for Uncached Attribute, in this chapter for more information. 
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SysAD[57 


During the address cycle of processor block read, typical block write, upgrade, and 


eliminate requests, SysAD[57] contains the secondary cache block way indication. This 
information may be useful for system designs implementing a duplicate tag or a directory- 
based coherency protocol. 


SysAD[56:40] 


When processor is in master state, it drives SysAD[56:40] to zero during address cycles. 


SysADJ[39:0] 


During the address cycle of processor and external requests, SysAD[39:0] contain the 
physical address. 


Table 6-22 presents the processor request address cycle address alignment. 


Table 6-22 Processor Request Address Cycle Alignment 


septibyte read/write 


Processor Request Type Address Alignment sa ateiteen Vio 
Block read Quadword 3:0 
Doubleword read/write Doubleword 2:0 
Singleword read/write Singleword 1:0 
Halfword read/write Halfword 0 
Byte, tribyte, quintibyte, sextibyte, Byte ; 


Block write 


Secondary cache block 


5:0 (SCBIkSize = 0) 
6:0 (SCBIkSize = 1) 


Upgrade 


Quadword 


3:0 


Eliminate 


Table 6-23 presents the external coherency request address cycle address alignment. 


Secondary cache block 


5:0 (SCBIkSize = 0) 
6:0 (SCBIkSize = 1) 


Table 6-23 External Coherency Request Address Cycle Alignment 


. Address Bits Which 
External Request Type Address Alignment Are Ignored 
Intervention Quadword 3:0 
: 5:0 (SCBIkSize = 0) 
Invalidate Secondary cache block 6:0 (SCBIkSize = 1) 
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SysAD[63:0] Data Cycle Encoding 


SysState[2:0] Encoding 
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During System interface data cycles, when less than a doubleword is transferred on the 
SysAD[63:0] bus, the valid byte lanes depend on the request address and the MemEnd 
mode bit. 


For example, consider the data cycle for a byte request whose address modulo 8 is 1. When 
MemEnd is negated (little endian), the SysAD[15:8] byte lane is valid. When MemEnd is 
asserted (big endian), the SysAD[55:48] byte lane is valid. 


The processor provides a processor coherency state response by driving the targeted 
secondary cache block tag quality indication on SysState[2], driving the targeted secondary 
cache block former state on SysState[1:0] and asserting SysState Val* for one SysClk 
cycle. Table 6-24 presents the encoding of the SysState[2:0] bus when SysState Val* is 
asserted. 


Table 6-24 Encoding of SysState[2:0] when SysStateVal* Asserted 


SysState[2] Secondary cache block tag quality indication 

0 Tag is good 
1 Tag is bad 

SysState[1:0] Secondary cache block former state 
0 Invalid 
1 Shared 
2 CleanExclusive 
3 DirtyExclusive 


When SysStateVal* is negated, SysState[0] indicates if a processor coherency data 
response is ready for issue. Table 6-25 presents the encoding of the SysState[2:0] bus when 
SysState Val* is negated. 


Table 6-25 Encoding of SysState[2:0] When SysStateVal* Negated 


SysState[2:1] Reserved 

0 Reserved 

1 

2 

3 

SysState[0] Processor coherency data response indication 

0 Not ready for issue 
1 Ready for issue 


SysResp[4:0] Encoding 


6.14 Interrupts 


Hardware Interrupts 
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An external agent issues an external completion response by driving the request number 
associated with the corresponding request on SysResp[4:2], driving the completion 
indication on SysResp[1:0], and asserting SysRespVal* for one SysClk cycle. Table 6-26 
presents the encoding of the SysResp[4:0] bus. 


Table 6-26 Encoding of SysResp[4:0] 


SysResp[4:2] Request number 
SysResp[1:0] Completion indication 
0 Acknowledge (ACK) 
1 Error (ERR) 
2 Negative acknowledge (NACK) 
3 Reserved 


The processor supports five hardware, two software, one timer, and one nonmaskable 
interrupt. The Interrupt exception is described in Chapter 17, the section titled “Interrupt 
Exception.” 


Five hardware interrupts are accessible to an external agent via external interrupt requests. 


An external interrupt request consists of a single address cycle on the System interface. 
During the address cycle, SysAD[63:60] specify the target indication, which allows an 
external agent to define the target processors of the external interrupt request. If a processor 
determines it is an external interrupt request target, SysAD[20:16] are the write enables for 
the five individual /nterrupt register bits and SysAD[4:0] are the values to be written into 
these bits, as shown in Figure 6-5. This allows any subset of the /nterrupt register bits to 
be set or cleared with a single external interrupt request. 


The /nterrupt register is an architecturally transparent, level-sensitive register that is 
directly readable as bits 14:10 of the Cause register. Since it is level-sensitive, an interrupt 
bit must remain asserted until the interrupt is taken, at which time the interrupt handler must 
cause a second external interrupt request to clear the bit. 


The processor clears the Interrupt register during any of the reset sequences. 
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Software Interrupts 


Timer Interrupt 


Nonmaskable Interrupt 
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8 IP[O] Software 
Interrupts 
9 IP[1] 
SysAD(4:0) 
Interrupt Value 
10 IP[2] 
IP[3 
[3] Hardware 
Interrupts 
IP[4] 
| 13 IP[5] 
| 20 |e) te faz foe] La fi 


SysAD(20:16) 


Write Enables Timer 
IP[7] Interrupt 


Interrupt register 
Cause(15:08) 


Figure 6-5 Hardware Interrupts 


The two software interrupts are accessible as bits 9:8 of the Cause register, as shown in 
Figure 6-5. An MTCO instruction is used to write these bits. 


The timer interrupt is accessible as bit 15 of the Cause register, IP[7], as shown in Figure 
6-5. This bit is set when one of the following occurs: 


¢ the Count register is equal to the Compare register 


¢ either one of the two performance counters overflows 


A nonmaskable interrupt is accessible to an external agent as the SysNMI* signal. To post 
a nonmaskable interrupt, an external agent asserts SysNMI™ for at least one SysClk cycle. 


The processor recognizes the nonmaskable interrupt on the first SysClk cycle that 
SysNMI* is asserted. After the nonmaskable interrupt is serviced, an external agent may 
post another nonmaskable interrupt by first negating SysNMI* for at least one SysCIk 
cycle, and reasserting SysNMI* for at least one SysClk cycle. 
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6.15 Protocol Abbreviations 


The following abbreviations are used in the System interface protocols: 


SysCmd[11:0] Abbreviations 


Cmd 
BIkRd 
RdShd 
RdExc 
DSPRd 
BlkWr 
DSPWr 
Ugd 
Elm 
IvnShd 
IvnExc 
Ale 

Ivd 

Int 
ExtCoh 
ReqDat 
RspDat 
ReqLst 
RspLst 
Empty 


Unspecified command 

Block read request command 

Coherent block read shared request command 
Coherent block read exclusive request command 
Double/single/partial-word read command 
Block write request command 
Double/single/partial-word write request command 
Upgrade request command 

Eliminate request command 

Intervention shared request command 
Intervention exclusive request command 
Allocate request number command 

Invalidate request command 

Interrupt request command 

External coherency request command 

Request data 

Response data 

Request last 


Response last 


Empty; SysCmd(11:0) and SysAD(63:0) are undefined 


SysAD[63:0] Abbreviations 


Adr 
Dat 
Dat<n> 


Physical address 
Unspecified data 
Doubleword n of a block 


SysState[2:0] Abbreviations 


State 
Ivd 
Shd 
CInExc 
DrtExc 


Unspecified state 
Invalid 

Shared 
CleanExclusive 


DirtyExclusive 
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SysResp[4:0] Abbreviations 


Rsp Unspecified completion response 

ACK Acknowledge completion response 

ERR Error completion response 

NACK Negative acknowledge completion response 


Master Abbreviations 


EA External agent 
Pn R10000 processor whose device number is n 
- Dead cycle 


6.16 System Interface Arbitration 
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The processor supports a simple System interface arbitration protocol, which relies on an 
external arbiter. This protocol is used in uniprocessor systems, multiprocessor systems 
using dedicated external agents, and multiprocessor systems using the cluster bus. System 
interface arbitration is handled by the SysReq*, SysGnt*, and SysRel* signals (request, 
grant, and release). 


As described earlier in this chapter, the System interface resides in either master or slave 
state; the processor enters slave state during all of the reset sequences. 


When mastership of the System interface changes, there is always one dead SysClk cycle 
during which the bidirectional signals are not driven; the processor ignores all bidirectional 
signals during this dead SysClk cycle. 


The protocol supports overlapped arbitration which allows arbitration to occur in parallel 
with requests and responses. This results in fewer wasted cycles when mastership of the 
System interface changes. 


Grant parking is also supported, allowing a device to retain mastership of the System 
interface as long as no other device requests the System interface. 


In multiprocessor systems using the cluster bus, the external arbiter typically implements a 
round-robin priority scheme. 
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System Interface Arbitration Rules 


The rules for the System interface arbitration are listed below: 


e — If the System interface is in slave state, and a processor request or coherency 
data response is ready for issue, and the required resources are available (e.g. 
a free request number, SysRdRdy* asserted, etc.), the processor asserts 
SysReq*. The processor will not assert SysReq* unless all of the above 
conditions are met. 


e The processor waits for the assertion of SysGnt*. 


¢ When the processor observes the assertion of SysGnt* it negates SysReq* 
two SysClk cycles later. Once the processor asserts SysReq*, it does not 
negate SysReq* until the assertion of SysGnt*, even if the need for the 
System interface bus is contravened by an external coherency request. 


e When the processor observes the assertion of SysRel*, it enters master state 
two SysClk cycles later, and begins to drive the System interface bus. 
SysRel* may be asserted coincidentally with or later than SysGnt*. 


¢ Once in master state, the processor does not relinquish mastership of the 
System interface until it observes the negation of SysGnt*. 


¢ The processor indicates it is relinquishing mastership of the System interface 
bus by asserting SysRel* for one SysClk cycle, two or more SysClk cycles 
after the negation of SysGnt*. The processor may issue any type of processor 
request or coherency data response in the two SysClk cycles following the 
negation of SysGnt*. This may delay the assertion of SysRel*. 
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Uniprocessor System 
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Cycle 

SysClk 
Master 
SysReq* 
SysGnt* 
SysRel* 
SysCmd(11:0) 
SysVal* 
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Figure 6-6 shows how the System interface arbitration signals are used in a uniprocessor 
system. Note that this same configuration would be used in a multiprocessor system using 
dedicated external agents. 


SysReq* pw) SysReq* 
R10000 SysGnt* fie SysGnt* pea 
SysRel* pw SysRel* gent 


Figure 6-6 Arbitration Signals for Uniprocessor System 


Figure 6-7 is an example of the operation of the System interface arbitration in a 
uniprocessor system. The Master row in the following figures indicates which device is 
driving the System interface bidirectional signals (Py and EA in 

Figure 6-7). When this row contains a dash (-), as shown in Cycle 12 of Figure 6-7, 
mastership of the System interface is changing and no device is driving the System interface 
bidirectional signals for this one dead SysClk cycle. 


The external agent generally asserts the SysGnt* signal, which allows the processor to 
issue requests at any time. 


When the external agent needs to return an external data response, it negates SysGnt* for 
a minimum of one cycle, waits for the processor to assert SysRel*, and then begins driving 
the System interface bus after one dead SysClk cycle. 
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Figure 6-7 Arbitration Protocol for Uniprocessor System 
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Multiprocessor System Using Cluster Bus 


Cycle 
SysClk 
Master 
SysReq0* 
SysGnt0* 
SysReq1* 
SysGnt1* 
SysReq2* 
SysGnt2* 
SysReq3* 
SysGnt3* 
SysRel* 
SysCmd(11:0) 
SysVal* 


Figure 6-8 shows how the System interface arbitration signals are used in a four-processor 


system using the cluster bus. 
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R10000, 
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Figure 6-8 Arbitration Signals for Multiprocessor System Using the Cluster Bus 


Figure 6-9 is an example of the System interface arbitration in a four-processor system 


using the cluster bus. 
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Figure 6-9 Arbitration Protocol for Multiprocessor System Using the Cluster Bus 
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6.17 System Interface Request and Response Protocol 


The following sections detail the System interface request and response protocol. A 32- 
word secondary cache block size is assumed in the examples below. 


Processor Request Protocol 
A processor request is generated when the R10000 processor requires a system resource. 


The processor may only issue a processor request when the System interface is in master 
state. If the System interface is in master state, the processor may issue a processor request 
immediately. Processor requests may occur in adjacent SysClk cycles. If the System 
interface is not in master state, the processor must first assert SysReq*, and then wait for 
the external agent to relinquish mastership of the System interface bus by asserting 
SysGnt* and SysRel*. 


When multiple, nonconflicting processor requests and/or coherency data responses are 
ready and meet all issue requirements, the processor uses the following priority: 


e block read and upgrade requests have the highest priority, followed by 
* processor coherency data responses, 
* processor eliminate and typical block write requests, 


* — processor double/single/partial-word read/write and uncached accelerated 
block write requests, which have the lowest priority. 
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Processor Block Read Request Protocol 


Errata 


A processor block read request results from a cached instruction fetch, load, store, or 


prefetch that misses in the secondary cache. Before issuing a processor block read request, 


the processor changes the secondary cache state to Invalid. Additionally, if the secondary 


cache block former state was DirtyExclusive, a write back is scheduled. Note that if the 


processor block read request receives an external NACK or ERR completion response, the 


secondary cache block state remains Jnvalid. 


The processor issues a processor block read request with a single address cycle. The 
address cycle consists of the following: 


negating SysCmd[11] 

driving a free request number on SysCmd[10:8] 

driving the block read command on SysCmd[7:5] 

driving the read cause indication on SysCmd[4:3] 

driving the secondary cache block former state on SysCmd[2:1] 
asserting SysCmd[0] 

driving the target indication on SysAD[63:60] 

driving the secondary cache block way on SysAD[57] 

driving the physical address on SysAD[39:0] 


asserting SysVal* 


The processor may only issue a processor block read request address cycle when the 
following are true: 


the System interface is in master state 
SysRdRdy* was asserted two SysClk cycles earlier 
there is no conflicting entry in the outgoing buffer 


the maximum number of outstanding processor requests specified by the 
PrcReqMax mode bits is not exceeded 


there is a free request number 


the processor is not the target of a conflicting outstanding external coherency 


request 
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A single processor may have as many as four processor block read requests outstanding on 
the System interface at any given time. 


Figure 6-10 depicts four processor block read requests. Since the System interface is 
initially in slave state, the processor must first assert SysReq* and then wait until the 
external agent relinquishes mastership of the System interface by asserting SysGnt* and 
SysRel*. 
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Figure 6-10 Processor Block Read Request Protocol 
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Processor Double/Single/Partial-Word Read Request Protocol 


A processor double/single/partial-word read request results from an uncached instruction 
fetch or load. 


The processor issues a processor double/single/partial-word read request with a single 
address cycle. The address cycle consists of: 


negating SysCmd[11] 

driving a free request number on SysCmd[10:8] 

driving the double/single/partial-word read command on SysCmd[7:5] 
driving the read cause indication on SysCmd[4:3] 

driving the data size indication on SysCmd[2:0] 

driving the target indication on SysAD[63:60] 

driving the uncached attribute on SysAD[59:58] 

driving the physical address on SysAD[39:0] 


asserting SysVal* 


The processor may only issue a processor double/single/partial-word read request address 
cycle when: 


the System interface is in master state 
SysRdRdy* was asserted two SysClk cycles previously 


the maximum number of outstanding processor requests specified by the 
PrcReqMax mode bits is not exceeded 


there is a free request number 


A single processor may have a maximum of one processor double/single/partial-word read 
request outstanding on the System interface at any given time. 
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Figure 6-11 depicts a processor double/single/partial-word read request. Since the System 
interface is initially in slave state, the processor must first assert SysReq* and then wait 
until the external agent gives up mastership of the System interface by asserting SysGnt* 
and SysRel*. 
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Figure 6-11 Processor Double/Single/Partial-Word Read Request Protocol 
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Processor Block Write Request Protocol 


A processor block write request results from the following: 


¢ replacement of a DirtyExclusive secondary cache block due to a load, store, or 
prefetch secondary cache miss 


e a CACHE Index WriteBack Invalidate (S) or Hit WriteBack Invalidate (S) 
instruction 


« acompletely gathered uncached accelerated block 


As shown in Figure 6-12, the processor issues a processor block write request with a single 
address cycle followed by 8 or 16 data cycles. 


The address cycle consists of the following: 
* negating SysCmd[11] 
* driving the block write command on SysCmd[7:5] 
e driving the write cause indication on SysCmd[4:3] 
* driving the target indication on SysAD[63:60] 
¢ driving the physical address on SysAD[39:0] 


* asserting SysVal* 


Errata 


If the processor block write request results from the writeback of a secondary cache block, 


the Dirty Exclusive secondary cache block former state is driven on SysAD[2:1], the 
secondary cache block way is driven on SysAD[57] and SysCmd][0] is asserted. 


If the processor block write request results from a completely gathered uncached 
accelerated block, the uncached attribute is driven on SysAD[59:58] and SysCmd[0] is 


negated. 


Each data cycle consists of the following: 
* asserting SysCmd[11] 
* driving the data quality indication on SysCmd[5] 
¢ driving the data type indication on SysCmd[4:3] 
¢ driving the data on SysAD[63:0] 


* asserting SysVal* 


The first 7 or 15 data cycles have a request data type indication, and the last data cycle has 
a request last data type indication. 
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The processor may negate SysVal* between data cycles of a processor block write request 
only if the SCCIk frequency is less than half of the SysClk frequency. 


The processor may only issue a processor block write request address cycle when the 
following are true: 


* — the System interface is in master state 
¢ SysWrRdy* was asserted two SysClk cycles previously 


e — the processor is not the target of a conflicting outstanding external coherency 
request 


Figure 6-12 depicts two adjacent processor block write requests. Since the System 
interface is initially in slave state, the processor must first assert SysReq* and then wait 
until the external agent relinquishes mastership of the System interface by asserting 
SysGnt* and SysRel*. 
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Figure 6-12 Processor Block Write Request Protocol 
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Processor Double/Single/Partial-Word Write Request Protocol 


A processor double/single/partial-word write request results from an uncached store or 
incompletely gathered uncached accelerated block. 


As shown in Figure 6-13, the processor issues a processor double/single/partial-word write 
request with a single address cycle immediately followed by a single data cycle. 


The address cycle consists of the following: 
* negating SysCmd[11] 
¢ driving the double/single/partial-word write command on SysCmd[7:5] 
e driving the write cause indication on SysCmd[4:3] 
¢ driving the data size indication on SysCmd[2:0] 
¢ driving the target indication on SysAD[63:60] 
* driving the uncached attribute on SysAD[59:58] 
* driving the physical address on SysAD[39:0] 
* asserting SysVal* 
The data cycle consists of the following: 
* asserting SysCmd[11] 
* driving the request last data type indication on SysCmd[4:3] 
e driving the write data on SysAD[63:0] 
* asserting SysVal* 


The processor may only issue a processor double/single/partial-word write request address 
cycle when the System interface is in master state and SysWrRdy* was asserted two 
SysClk cycles previously. 


141 


Chapter 6 System Interface Operations 


Figure 6-13 depicts three processor double/single/partial write requests. Since the System 
interface is initially in slave state, the processor must first assert SysReq* and then wait 
until the external agent relinquishes mastership of the System interface by asserting 
SysGnt* and SysRel*. 
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Figure 6-13 Processor Double/Single/Partial-Word Write Request Protocol 
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Processor Upgrade Request Protocol 


A processor upgrade request results from a store or prefetch exclusive that hits a Shared 
block in the secondary cache. 


As shown in Figure 6-14, the processor issues a processor upgrade request with a single 
address cycle. This address cycle consists of the following: 


* negating SysCmd[11] 

* driving a free request number on SysCmd[10:8] 

* driving the upgrade command on SysCmd[7:5] 

* driving the upgrade cause indication on SysCmd[4:3] 

* driving the secondary cache block former state on SysCmd[2:1] 
* asserting SysCmd[0] 

* driving the target indication on SysAD[63:60] 

¢ driving the secondary cache block way on SysAD[57] 

¢ driving the physical address on SysAD[39:0] 


* asserting SysVal* 


The processor may only issue a processor upgrade request address cycle when the 
following are true: 


¢ the System interface is in master state 
¢ SysRdRdy* was asserted two SysClk cycles previously 


e the maximum number of outstanding processor requests specified by the 
PrcReqMax mode bits is not exceeded 


e there is a free request number 


¢ the processor is not the target of a conflicting outstanding external coherency 
request 


A single processor may have as many as four processor upgrade requests outstanding on 
the System interface at any given time. 
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Figure 6-14 depicts four processor upgrade requests. Since the System interface is initially 


in slave state, the processor must first assert SysReq* and then wait until the external agent 


relinquishes mastership of the System interface by asserting SysGnt* and SysRel*. 
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Figure 6-14 Processor Upgrade Request Protocol 
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Processor Eliminate Request Protocol 


A processor eliminate request results from the following: 


a cached instruction fetch, load, store, or prefetch that misses in the secondary 
cache and forces the replacement of a Shared or CleanExclusive secondary 
cache block 


a CACHE Index WriteBack Invalidate (S), Hit Invalidate (S), or Hit 
WriteBack Invalidate (S) instruction that forces the invalidation of a Shared or 
CleanExclusive secondary cache block 

a CACHE Hit Invalidate (S) instruction that forces the invalidation of a 
DirtyExclusive secondary cache block. 


A processor eliminate request notifies the external agent that a Shared, CleanExclusive, or 


DirtyExclusive block has been eliminated from the secondary cache. Such requests are 


useful for systems implementing a directory-based coherency protocol, and are enabled by 
asserting the PrcElImReq mode bit. 


The processor issues a processor eliminate request with a single address cycle. This address 
cycle consists of the following: 


negating SysCmd[11] 

driving the special command on SysCmd[7:5] 

driving the eliminate special cause indication on SysCmd[4:3] 
driving the secondary cache block former state on SysCmd[2:1] 
asserting SysCmd[0] 

driving the target indication on SysAD[63:60] 

driving the secondary cache block way on SysAD[57] 


driving the physical address of the eliminated secondary cache block on 
SysAD[39:0] 


asserting SysVal* 


The processor may only issue a processor eliminate request address cycle when the 
following are true: 


the System interface is in master state 
SysWrRdy* was asserted two SysClk cycles previously 
the PrcElmReq mode bit is asserted 


the processor is not the target of a conflicting outstanding external coherency 
request 
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Figure 6-15 depicts three processor eliminate requests. Since the System interface is 
initially in slave state, the processor must first assert SysReq* and then wait until the 
external agent relinquishes mastership of the System interface by asserting SysGnt* and 
SysRel*. 
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Figure 6-15 Processor Eliminate Request Protocol 
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Processor Request Flow Control Protocol 


The processor provides the signals SysRdRdy* and SysWrRdy* to allow an external agent 
to control the flow of processor requests. SysRdRdy* controls the flow of processor read 
and upgrade requests whereas SysWrRdy* controls the flow of processor write and 
eliminate requests. 


The processor can only issue a processor read or upgrade request address cycle to the 
System interface if SysRdRdy* was asserted two SysClk cycles previously. Similarly, the 
processor can only issue the address cycle of a processor write or eliminate request to the 
System interface if SysWrRdy* was asserted two SysClk cycles previously. 


To determine the processor request buffering requirements for the external agent, note that 
the processor can issue any combination of processor requests in adjacent SysClk cycles. 
Also, since the System interface operates register-to-register with the external agent, a 
round trip delay of four SysClk cycles occurs between a processor request address cycle 
which prompts the external agent for flow control, and the flow control actually preventing 
any additional processor request address cycles from occurring. Consequently, if the 
maximum number of outstanding processor requests specified by the PreReqMax mode 
bits is four, the external agent must be able to accept at least four processor read or upgrade 
requests. Also, the external agent must be able to accept at least four processor eliminate 
requests, two processor double/single/partial-word write requests, or one processor block 
write request. 


Figure 6-16 depicts three processor double/single/partial-word write requests and four 
processor block read requests. After sensing the first processor double/single/partial-word 
write request, the external agent negates SysWrRdy*. The external agent must have 
buffering sufficient for one additional processor write request before the flow control takes 
effect. 


The external agent negates SysRdRdy* upon observing the first processor read request. 
The external agent must have buffering sufficient for three additional processor read 
requests before the flow control takes effect. 
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Figure 6-16 Processor Request Flow Control Protocol 
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External Response Protocol 


The processor supports two classes of external responses: 


* external data responses provide a double/single/partial-word of data or provide 
a block of data using the SysAD[63:0] bus 


e external completion responses provide an acknowledge, error, or negative 


acknowledge indication using the SysResp[4:0] bus 


An external agent may only issue an external data response to the processor when the 
System interface is in slave state. If the System interface is not already in slave state, the 
external agent must first negate SysGnt* and then wait for the processor to assert SysRel*. 
If the System interface is already in slave state, the external agent may issue an external data 
response immediately. 


External data responses may be accepted by the processor in adjacent SysClk cycles and in 
arbitrary order, relative to corresponding processor requests. 


An external agent may issue an external completion response when the System interface is 
in either master or slave state. External completion responses may be accepted by the 
processor in adjacent SysClk cycles and in arbitrary order, relative to the corresponding 
processor requests. 


External Block Data Response Protocol 


An external agent may issue an external block data response in response to a processor 
block read or upgrade request. 


An external agent issues an external block data response with 8 or 16 data cycles. Each data 
cycle consists of the following: 


* asserting SysCmd[11] 


e driving the request number associated with the corresponding processor 
request on SysCmd[10:8] 


* driving the data quality indication on SysCmd[5] 
e driving the data type indication on SysCmd[4:3] 
¢ driving the cache block state on SysCmd[2:1] 

¢ driving the ECC check indication on SysCmd[0] 
¢ driving the data on SysAD[63:0] 

* — asserting SysVal* 


The first 7 or 15 data cycles have a response data type indication, and the last data cycle has 
a response last data type indication. The external agent may negate SysVal* between data 
cycles of an external block data response. 
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External block data response data must be supplied in subblock order, beginning with the 
quadword-aligned address specified by the corresponding processor request. 


External block data responses for processor coherent block read shared or noncoherent 
block read requests may indicate a state of Shared, CleanExclusive, or DirtyExclusive. 
External block data responses for processor coherent block read exclusive or upgrade 
requests may indicate a state of CleanExclusive or DirtyExclusive. 


Figure 6-17 depicts two processor block read requests and the corresponding external block 
data responses. 
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Figure 6-17 External Block Data Response Protocol 
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External Double/Single/Partial-Word Data Response Protocol 
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An external agent may issue an external double/single/partial-word data response in 


response to a processor double/single/partial-word read request. 


An external agent issues an external double/single/partial-word data response with a single 


data cycle; the data cycle consists of: 
* asserting SysCmd[11] 


e driving the request number associated with the corresponding processor 
request on SysCmd[10:8] 


¢ driving the data quality indication on SysCmd[5] 

* driving the response last data type indication on SysCmd[4:3] 
¢ driving the ECC check indication on SysCmd[0] 

e driving the data on SysAD[63:0] 


* asserting SysVal* 


Figure 6-18 depicts a processor double/single/partial-word read request and the 
corresponding external double/single/partial-word data response. 
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Figure 6-18 External Double/Single/Partial-Word Data Response Protocol 
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External Completion Response Protocol 


An external agent issues an external completion response to provide an acknowledge, error, 
or negative acknowledge to an outstanding request, and to free the associated request 
number. 


An external agent issues an external completion response by driving the response on 
SysResp[4:0] and asserting SysRespVal* for one SysClk cycle. SysResp[4:2] contains 
the request number associated with the corresponding outstanding request and 
SysResp[1:0] contains an acknowledge, error, or negative acknowledge indication, as 
described below: 


e The external agent issues an external ACK completion response for a 
processor read or upgrade request to indicate that the request was successful. 
An external ACK completion response may only be issued for a processor 
read request if a corresponding external data response is coincidentally or 
previously issued. 


e The external agent issues an external ERR completion response for a 
processor read or upgrade request to indicate that the request was 
unsuccessful. Upon receiving an external ERR completion response, the 
processor takes a Bus Error exception on the associated instruction. If the 
processor read or upgrade request was caused by a PREFETCH instruction, no 
exception is taken. Also, if the request was caused by a speculative 
instruction, no exception is taken. 


¢ The external agent issues an external NACK completion response for a 
processor read or upgrade request to indicate that the request was not 
accepted. Upon receiving an external NACK completion response, the 
processor re-evaluates the associated instruction. Due to the speculative 
nature of the R10000 processor, the re-evaluation may or may not result in the 
reissue of a similar processor request. 


An external ERR or NACK completion response issued in response to an external 
intervention, allocate request number, or invalidate has no affect on the processor except to 
free the request number. 


152 


HN SA RR RI, Ne a, gt ae 


'40'°41'°12'13'° 44°45 ' 16: 17 


XB 
xX __X 


153 


Chapter 6 System Interface Operations 


Figure 6-19 depicts a processor upgrade request and a corresponding external completion 


response. 
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Figure 6-19 External Completion Response Protocol 
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An external agent issues an external request when it requires a resource within the 
processor. The external agent refers to any device attached to the processor system 


interface. It may be memory interface or cluster coordinator ASIC, or another processor 


residing on the cluster bus. 


An external agent may only issue an external request to the processor when the System 
interface is in slave state. If the System interface is not already in slave state, the external 
agent must first negate SysGnt* and then wait for the processor to assert SysRel*. If the 
System interface is already in slave state, the external agent may issue an external request 
immediately. The total number of outstanding external requests, including interventions, 
allocate request numbers, and invalidates, cannot exceed eight. 


External requests may be accepted by the processor in adjacent SysClk cycles. External 
intervention and invalidate requests are considered external coherency requests. 
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External Intervention Request Protocol 


Cycle 

SysClk 
Master 
SysReq* 
SysGnt* 
SysRel* 
SysCmd(11:0) 
SysCmdPar 
SysAD(63:0) 
SysADChk(7:0) 
SysVal* 
SysRdRdy* 
SysWrRdy* 
SysState(2:0) 
SysStatePar 
SysStateVal* 
SysResp(4:0) 
SysRespPar 
SysRespVal* 


An external agent issues an external intervention request to obtain a Shared or Exclusive 
copy of a secondary cache block. 


An external agent issues an external intervention request with a single address cycle; this 
address cycle consists of the following: 


* negating SysCmd[11] 

¢ — driving a request number on SysCmd[10:8] 

* driving the intervention command on SysCmd[7:5] 
¢ driving the ECC check indication on SysCmd[0] 

¢ driving the target indication on SysAD[63:60] 

¢ driving the physical address on SysAD[39:0] 


* asserting SysVal* 


An external agent may only issue an external intervention request address cycle when the 
System interface is in slave state; typically a free request number is specified. An external 
agent may have as many as eight external intervention requests outstanding on the System 
interface at any given time. 


Figure 6-20 depicts three external intervention requests. Since the System interface is 
initially in master state, the external agent must first negate SysGnt* and then wait until the 
processor relinquishes mastership of the System interface by asserting SysRel*. 
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Figure 6-20 External Intervention Request Protocol 
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External Allocate Request Number Request Protocol 


Cycle 

SysClk 
Master 
SysReq* 
SysGnt* 
SysRel* 
SysCmd(11:0) 
SysCmdPar 
SysAD(63:0) 
SysADChk(7:0) 
SysVal* 
SysRdRdy* 
SysWrRdy* 
SysState(2:0) 
SysStatePar 
SysStateVal* 
SysResp(4:0) 
SysRespPar 
SysRespVal* 


An external agent issues an external allocate request number request to reserve a request 
number for private use. Once allocated, the processor is prevented from using the request 
number until an external completion response for the request number is received. 


An external agent issues an external allocate request number request with a single address 
cycle; this address cycle consists of the following: 


* negating SysCmd[11] 
* driving a free request number on SysCmd[10:8] 
¢ driving the allocate request number command on SysCmd[7:5] 


* asserting SysVal* 


An external agent may only issue an external allocate request number request address cycle 
when the System interface is in slave state and there is a free request number. The external 
agent may have as many as eight external allocate request number requests outstanding on 
the System interface at any given time. 


Figure 6-21 depicts three external allocate request number requests. Since the System 
interface is initially in master state, the external agent must first negate SysGnt* and then 
wait until the processor relinquishes mastership of the System interface by asserting 


SysRel*. 
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Figure 6-21 External Allocate Request Number Request Protocol 
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External Invalidate Request Protocol 


Cycle 

SysClk 
Master 
SysReq* 
SysGnt* 
SysRel* 
SysCmd(11:0) 
SysCmdPar 
SysAD(63:0) 
SysADChk(7:0) 
SysVal* 
SysRdRdy* 
SysWrRdy* 
SysState(2:0) 
SysStatePar 
SysStateVal* 
SysResp(4:0) 
SysRespPar 
SysRespVal* 


An external agent issues an external invalidate request to invalidate a secondary cache 


block. 


An external agent issues an external invalidate request with a single address cycle. This 


address cycle consists of the following: 


* negating SysCmd[11] 


* driving a request number on SysCmd[10:8] 


* driving the invalidate command on SysCmd[7:5] 
e driving the ECC check indication on SysCmd[0] 
¢ driving the target indication on SysAD[63:60] 
* driving the physical address on SysAD[39:0] 


* asserting SysVal* 


An external agent may only issue an external invalidate request address cycle when the 
System interface is in slave state; typically a free request number is specified. An external 
agent may have as many as eight external invalidate requests outstanding on the System 
interface at any given time. 


Figure 6-22 depicts three external invalidate requests. Since the System interface is initially 
in master state, the external agent must first negate SysGnt* and then wait until the 
processor relinquishes mastership of the System interface by asserting SysRel*. 
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External Interrupt Request Protocol 


An external agent issues an external interrupt request to interrupt the normal instruction 
flow of the processor. 


An external agent issues an external interrupt request with a single address cycle. This 
address cycle consists of the following: 


* negating SysCmd[11] 

* driving the special command on SysCmd[7:5] 

* driving the interrupt special cause indication on SysCmd[4:3] 
e driving the ECC check indication on SysCmd[0] 

* driving the target indication on SysAD[63:60] 

¢ driving the Interrupt register write enables on SysAD[20:16] 
* driving the Interrupt register values on SysAD[4:0] 


* asserting SysVal* 


An external agent may only issue an external interrupt request address cycle when the 
System interface is in slave state. 


Figure 6-23 depicts three external interrupt requests. Since the System interface is initially 
in master state, the external agent must first negate SysGnt* and then wait until the 
processor relinquishes mastership of the System interface by asserting SysRel*. 
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Figure 6-23 External Interrupt Request Protocol 
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Processor Response Protocol 


Processor responses are supplied by the processor in response to external coherency 
requests that target the processor. The R10000 processor issues a processor coherency state 
response for each external coherency request that targets the processor. The processor 
issues a processor coherency data response for each external intervention request that 
targets the processor and hits a DirtyExclusive secondary cache block. 


Processor coherency state responses are issued by the processor in the same order that the 
corresponding external coherency requests are received. Processor coherency state and 
data responses may occur in adjacent SysClk cycles. 
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Processor Coherency State Response Protocol 


A processor coherency state response results from an external coherency request that 
targets the processor. 


Errata 
The processor issues a processor coherency state response by driving the secondary cache 
block tag quality indication on SysState[2], driving the secondary cache block former state 
on SysState[1:0], and asserting SysStateVal* for one SysClk cycle. The processor 
coherency state responses are issued in an order designated by the external coherency 
requests and will always be issued before an associated processor coherency data response. 
Note that processor coherency state responses can be pipelined ahead of the associated 
processor coherency data responses, and processor coherency data responses can be 
returned out-of-order. These cases typically arise from external coherency requests hitting 
outgoing buffer entries. 
Figure 6-24 depicts two external coherency requests and the resulting processor coherency 
state responses. 
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Figure 6-24 Processor Coherency State Response Protocol 
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Processor Coherency Data Response Protocol 


A processor coherency data response results from an external intervention request that 
targets the processor and hits a DirtyExclusive secondary cache block. 


The processor issues a processor coherency data response with a single empty cycle 
followed by either 8 or 16 data cycles. The empty cycle consists of negating Sys Val* for a 
single SysClk cycle. The data cycles consist of the following: 

* asserting SysCmd[11] 


e driving the request number associated with the corresponding external 
coherency request on SysCmd[10:8] 


¢ driving the data quality indication on SysCmd[5] 

¢ driving the data type indication on SysCmd[4:3] 

e driving the state of the cache block on SysCmd[2:1] 

* asserting SysCmd[0] 

¢ driving the data on SysAD[63:0], 

* — asserting SysVal* 
The first 7 or 15 data cycles have a response data type indication, and the last data cycle has 
a response last data indication. The processor may negate SysVal* between data cycles of 


a processor coherency data response only if the SCCIk frequency is less than half of the 
SysClk frequency. 


The processor may only issue a processor coherency data response when the System 
interface is in master state and SysWrRdy* was asserted two SysClk cycles previously. 
Note that the empty cycle is considered the issue cycle for a processor coherency data 
response. If the System interface is not already in master state, the processor must first 
assert SysReq*, and then wait for the external agent to relinquish mastership of the System 
interface bus by asserting SysGnt* and SysRel*. If the System interface is already in 
master state, the processor may issue a processor coherency data response immediately. 
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Errata 
When SysStateVal* is negated, SysState[0] provides the processor coherency data 
response indication. The processor asserts the processor coherency data response indication 
when there are one or more processor coherency data responses pending issue in the 
outgoing buffer. Once asserted, the indication is negated when the first doubleword of the 
last pending issue processor coherency data response is issued to the system interface bus. 
The processor coherency data response indication is not affected by SysWrRdy*. 
However, as previously noted the processor may only issue a processor coherency data 
response when SysWrRdy* was asserted two SysClk cycles previously. 
Processor coherency data response data is supplied in subblock order, beginning with the 
quadword-aligned address specified by the corresponding external coherency request. 
Processor coherency data responses are not necessarily issued in the same order as the 
external coherency requests; however each processor coherency data response always 
follows the corresponding processor coherency state response. Note that more than one 
processor coherency state response may be pipelined ahead of the corresponding processor 
coherency data responses. 
Figure 6-25 depicts one external coherency request and the resulting processor coherency 
state and data responses. 
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Figure 6-25 Processor Coherency Data Response Protocol 
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6.18 System Interface Coherency 


The System interface supports external intervention shared, intervention exclusive, and 
invalidate coherency requests. These requests are used by an external agent or other 
R10000 processors on the cluster bus to maintain cache coherency. 


Each external coherency request that targets an R10000 results in a processor coherency 
state response. Additionally, each external intervention request that targets the R10000 and 
hits a DirtyExclusive secondary cache block results in a processor coherency data response. 


External coherency requests and the corresponding processor coherency state responses are 
handled in FIFO order. 


External Intervention Shared Request 


An external intervention shared request is used by an external agent to obtain a Shared copy 
of a cache block. If the desired block resides in the processor cache, it is marked Shared. 


If the secondary cache block’s former state was DirtyExclusive, the processor issues a 
processor coherency data response. 


External Intervention Exclusive Request 


An external intervention exclusive request is used by an external agent to obtain an 
Exclusive copy of a cache block. If the desired block resides in the processor cache, it is 
marked Invalid. 


If the secondary cache block’s former state was DirtyExclusive, the processor issues a 
processor coherency data response. 


External Invalidate Request 


An external invalidate request is used by an external agent to invalidate a cache block. If 
the desired block resides in the processor cache, it is marked Invalid. 


Under normal circumstances, the secondary cache block former state should not be 
CleanExclusive or DirtyExclusive. 
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External Coherency Request Action 


Table 6-27 indicates the action taken for external coherency requests that target the 
processor. 


Table 6-27 Action Taken for External Coherency Requests that Target the RIO0000 Processor" 


Processor Coher-| Processor Co-|Processor Coher- 
Secondary Cache Secondary 
Block Type of Cache Block | °@°Y State Re- | herency Data| ency Data Re- 
Rovner State External Request New State sponse Response Re-| sponse State 
SysState[1:0] quired? SysCmd[2:1] 
Intervention shared Invalid 0 No N/A 
Invalid Intervention exclusive Invalid 0 No N/A 
Invalidate Invalid 0 No N/A 
Intervention shared Shared 1 No N/A 
Shared Intervention exclusive Invalid 1 No N/A 
Invalidate Invalid 1 No N/A 
Intervention shared Shared 2 No N/A 
CleanExclusive Intervention exclusive Invalid 2 No N/A 
Invalidate* Invalid 2 No N/A 
Intervention shared” Shared 3 Yes Shared 
DirtyExclusive Intervention exclusive Invalid 3 Yes DirtyExclusive 
Invalidate* Invalid 3 No N/A 


+ This should not occur under normal circumstances. 


* The processor coherency data response must be written back to memory. 


+ These actions are taken in cases where there are no internal coherency conflicts. For 
exceptions due to internal coherency conflicts, please refer to Table 6-28. 
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Coherency conflicts arise when a processor request and an external request target the same 


secondary cache block. Coherency conflicts may be categorized as either internal or 


external, and are described in this section. 


Internal Coherency Conflicts 


A processor request is considered to be pending issue when it is buffered in the processor 


and has not yet been issued to the System interface bus. Internal coherency conflicts occur 


when the processor has a processor request pending issue and a conflicting external 


coherency request is received. Internal coherency conflicts are unavoidable and cannot be 


anticipated by the external agent since it cannot anticipate when the processor will have 
processor requests pending issue. 


Table 6-28 describes the manner in which the processor resolves internal coherency 


conflicts. 


Table 6-28 Internal Coherency Conflict Resolution 


Processor Request Pend- 
ing Issue 


Conflicting External Co- 


herency Request 


Resolution 


Coherent block read 


Intervention shared 


Intervention exclusive 


Invalidate 


The processor allows the conflicting external 
coherency request to proceed and provides an Invalid 
processor coherency state response. The processor 
stalls the processor coherent block read request until 
the conflicting external coherency request has 
received an external completion response. 


Intervention shared 


Intervention exclusive 


Invalidate 


The processor allows the conflicting external 
coherency request to proceed and provides a Shared 
processor coherency state response. Once the 
conflicting external coherency request has received 
an external completion response, the processor 
internally NACKs the processor upgrade request that 
is pending issue. 


Block write 


Intervention shared 


Intervention exclusive 


The processor provides a DirtyExclusive processor 
coherency state response and changes the processor 
block write request that is pending issue into a 
DirtyExclusive processor coherency data response. 


Invalidate 


The processor provides a DirtyExclusive processor 
coherency state response and deletes the processor 
block write request that is pending issue. 


Eliminate 


Intervention shared 


Intervention exclusive 


Invalidate 


The processor provides a Shared or CleanExclusive 
processor coherency state response and deletes the 
processor eliminate request that is pending issue.* 


+ Ifthe processor eliminate request that is pending issue has a DirtyExclusive state, a CleanExclusive processor coherency state response is 


provided. 
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External Coherency Conflicts 


Errata 
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A processor request is considered to be pending response when it has been issued to the 
System interface bus but has not yet received an external data or completion response. 
External coherency conflicts occur when the processor has a processor request that is 
pending response and a conflicting external coherency request is received. The processor 
relies on the external agent to detect and resolve external coherency conflicts. If the 
external agent chooses to issue an external coherency request to the processor which causes 


an external coherency conflict, the external coherency request must be completed before an 


external response is given to the conflicting processor request. 


External coherency conflicts may be avoided if the point of coherence is the processor 
System interface bus and only one request is allowed to be outstanding for any given 
secondary cache block. However, in some system designs external coherency conflicts are 
unavoidable. 


Processor block write and eliminate requests are never pending response, and therefore 
cannot cause external coherency conflicts. 


Table 6-29 describes the manner in which the external agent resolves external coherency 
conflicts. 
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Table 6-29 External Coherency Conflict Resolution 


Processor Requests that | Conflicting External Co- 


‘ Resolution 
are Pending Response herency Request 
Intervention shared The external agent responds to the external coherency 
Tntenvention exclusiwe requestor that the block is Invalid. At some later time, 
Coherent block read the external agent supplies an external response to the 
Fnwaldae processor coherent block read request that is pending 


response.* 


The external agent responds to the external coherency 
requestor that the block is Shared. At some later time, 
the external agent supplies an external response to the 
processor upgrade request that is pending response. 


Intervention shared 


Intervention exclusive The external agent issues the conflicting external 
coherency request to the processor. The processor 
allows the conflicting external coherency request to 
proceed and supplies a Shared processor coherency 
Upgrade state response. After observing the processor 
coherency state response, the external agent provides 
an external ACK completion response for the 
conflicting external coherency request. At some later 
time, the external agent supplies an external response 
for the processor upgrade request that is pending 
response. This external response may not be an 
external ACK completion response unless it is 
associated with an external block data response. 


Invalidate 


£ Although it is not required, the external agent may choose to issue the conflicting external coherency request to R10000 and the processor 
will return an invalid processor coherency state response. 


* Although it is not required, the external agent may choose to issue the conflicting external coherency request to R10000 and the processor 


will return a shared processor coherency state response. 


Errata 


Revised the two footnotes in Table 6-29 above. 
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External Coherency Request Latency 


This section describes the R10000 external coherency request latency. Figure 6-26 depicts 
the following: 


e an external coherency request which targets the processor 
e the resulting processor coherency state response 


e the potential processor coherency data response 


Two external coherency request latency parameters are also defined: 


* — the processor coherency state response latency, tycs, specifies the time from 
external coherency request to processor coherency state response 


* — the processor coherency data response latency, tpg, specifies the time from 

the external coherency request to the processor coherency data response if a 

master, or to the assertion of the processor coherency data response indication 

on SysState[0] if a slave. 
Cycle 17 
SysClk 
Master 
SysReq* 
SysGnt* 
SysRel* 
SysCmd(11:0) ; 
SysCmdPar i 
SysAD(63:0) 
SysADChk(7:0) 1 
SysVal* 
SysRdRdy* ; 
SysWrRdy* ; 
SysState(2:0) ; 
SysStatePar : 
SysStateVal* : 
SysResp(4:0) 
SysRespPar 


SysRespVal* 1 


Figure 6-26 External Coherency Request Latency Parameters 
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The external coherency request latency is presented in Table 6-30. 


Table 6-30 External Coherency Request Latency 


Latency* (PCIk cycles) 
Processor Coherency State Processor Coherency Data Re 
Response (tpesr) sponse" (tycar) 
SCCIkDiv | Min’ | Typ** | Max” | Min’? | Typ*** | Max”” 
1 5 10 39 8 28 70 
1.5 5 13 48 8 33 88 
2 5 14 59 8 38 105 
25 5 16 71 8 43 128 
3 5 17 719 8 43 141 


£ This latency assumes no other previously issued external coherency requests are outstanding. 1 
to 3 additional PClk cycles may be required for synchronization with SysClk depending on the 
SysClkDiv mode bits. 


* This value assumes a 32-word secondary cache block size. 
+ This value assumes the external coherency request hits a cached or outgoing buffer entry. 


££ This value assumes the external coherency request does not hit a cached or outgoing buffer entry, 
the secondary cache is not busy, and the external coherency request hits in the MRU way of the 
secondary cache. If the external coherency request misses in the most-recently used (MRU) way 
of the secondary cache, | to 3 additional PCIk cycles are required to query the LRU way of the 
secondary cache, depending on the SCCIkDiv mode bits. 


** This value assumes the external coherency request does not hit a cached or outgoing buffer entry, 
the secondary cache just commenced an index-conflicting CACHE Hit WriteBack Invalidate (S), 
and the external coherency request misses in the secondary cache MRU way. 


++ This value assumes the external coherency request hits an outgoing buffer entry. 


££4 This value assumes the external coherency request does not hit a cached or outgoing buffer entry, 
the secondary cache is not busy, the external coherency request hits in the MRU way of the 
secondary cache, no subset primary data cache blocks are inconsistent, and the external coherency 
request is secondary cache block-aligned. If the external coherency request misses in the MRU 
way of the secondary cache, | to 3 additional PClk cycles are required to query the LRU way of 
the secondary cache, depending on the SCCIkDiv mode bits. 


**** This value assumes the external coherency request does not hit a cached or outgoing buffer entry, 
the secondary cache just commenced an index-conflicting CACHE Hit WriteBack Invalidate (S), 
the external coherency request hits in the LRU way of the secondary cache, all subset primary 
data cache blocks are inconsistent, and the external coherency request is not secondary cache 
block-aligned. 
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The SysGblPerf* signal is provided for systems implementing a relaxed consistency 
memory model. The external agent asserts this signal when all processor requests are 
globally performed, thereby allowing the processor to graduate SYNC instructions. The 
external agent negates this signal when some processor requests are not yet globally 
performed, thereby preventing the processor from graduating SYNC instructions. 


To prevent a SYNC instruction from graduating, the external agent must negate the 
SysGblPerf* signal no later than the same SysClk cycle in which it issued the external 
completion response for a processor read or upgrade request which is not yet globally 
performed. Also, the external agent must negate the SysGblPerf* signal no later than two 
SysClk cycles after the address cycle of a processor double/single/partial-word write 
request which has not yet been globally performed. 


The SysGblPerf* signal may be permanently asserted in systems implementing a 
sequential consistency memory model. 


6.19 Cluster Bus Operation 
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A R10000 multiprocessor cluster may be created by directly attaching the System 
interfaces of 2 to 4 R10000 processors, and providing an external cluster coordinator to 
handle arbitration and coherency management. 


The cluster coordinator arbitrates the multiprocessors using the SysReq*, SysGnt*, and 
SysRel* signals. 


A processor request issued by an R10000 processor in master state is observed as an 
external request by any R10000 processors in the slave state on the cluster bus. This is 
described Table 6-31. 


Table 6-31 Relationship Between Processor and External Requests for the Cluster Bus 


Processor Request External Request 
Coherent block read shared Intervention shared 
Coherent block read exclusive Intervention exclusive 
Noncoherent block read Allocate request number 
Double/single/partial-word read Allocate request number 
Block write NOP 
Double/single/partial-word write NOP 
Upgrade Invalidate 
Eliminate NOP 
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In the same manner, a processor coherency data response issued by a processor in the 
master state is observed as an external block data response by any processors in the slave 
state. 


External coherency requests that target a processor are handled in FIFO order and result in 
processor coherency state responses. If an external coherency request that targets a 
processor hits a DirtyExclusive secondary cache block, the processor also provides a 
processor coherency data response. 


Figure 6-27 presents an example of a processor read request with four R10000 processors 
residing on the cluster bus. The CohPrcReqTar mode bit is asserted for a snoopy-based 
coherency protocol. R10000 issues a processor coherent read exclusive request. This is 
observed as an external intervention exclusive request by R10000;, R10000>, and R100003. 
R10000, and R100003 respond with Jnvalid processor coherency state responses. R10000, 
responds with a DirtyExclusive processor coherency state response. Based on these 
processor coherency state responses, the cluster coordinator allows R10000, to become 
master of the System interface so that it may provide a processor coherency data response, 
which will be observed as an external block data response by R10000 9. Finally, the cluster 
coordinator issues an external ACK completion response to forward the external block data 
response and to free the request number. 


Figure 6-28 presents an example of a processor upgrade request with four R10000 
processors residing on the cluster bus. The CohPrcReqTar mode bit is asserted for a 
snoopy-based coherency protocol. R100009 issues a processor upgrade request, observed 
as an external invalidate request by R10000,, R100005, and R100003. R10000, and 
R100003 provide Shared processor coherency state responses. R10000, provides an 
Invalid processor coherency state response. Based on these processor coherency state 
responses, the cluster coordinator issues an external ACK completion response for the 
processor upgrade request to indicate that the request was successful and to free the request 
number. 
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Figure 6-27 R10000 Multiprocessor Cluster Processor Read Request Example 
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R10000 Multiprocessor Cluster Processor Upgrade Request Example 
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6.20 Support for I/O 


Chapter 6 System Interface Operations 


The processor assumes a memory-mapped I/O model. Consequentially, no special System 
interface encodings are provided, or required to designate I/O accesses. It is left to the 
programmer to ensure that I/O addresses have the appropriate TLB mappings. 


The processor supports system designs utilizing hardware or software for coherent I/O. 
The external coherency requests are useful for creating systems with hardware I/O 
coherency, and the CACHE instruction is sufficient for creating a system with software I/O 
coherency. 


6.21 Support for External Duplicate Tags 
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Some system designs implement an external duplicate copy of the secondary cache tags to 
reduce the coherency request latency and also filter out unnecessary external coherency 
requests made to the R10000 processor. 


For such systems, it must be remembered that blocks may reside in either the secondary 
cache or in the outgoing buffer. During the address cycle of processor block read requests, 
the secondary cache block former state is provided. The external agent may use this 
information to maintain the external duplicate tags. 


Typically, in a multiprocessor system using the cluster bus, the cluster coordinator specifies 
a free request number for an external coherency request. However, in a system using a 
duplicate-tag or directory-based coherency protocol, where the CohPrcReqTar mode bit 
is negated, the cluster coordinator may specify a busy request number for an external 
coherency request, providing each targeted R10000 processor has the request number busy 
due to an outstanding processor coherency request from another processor. 


For example, suppose the processor in master state issues a processor coherent block read 
or upgrade request. The processors in slave state observe the processor request as an 
external coherency request that targets the external agent only, causing the associated 
request number to become busy. The cluster coordinator checks the duplicate tag or 
directory structure to determine if the block resides in the cache of one of the processors 
that was in slave state. If necessary, the cluster coordinator issues an external coherency 
request targeted at one or more of the processors that were in slave state. By using the same 
request number as the original processor request, this external coherency request does not 
consume a free request number, and allows a potential processor coherency data response 
to be supplied as an external block data response to the original processor request. 
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6.22 Support for a Directory-Based Coherency Protocol 
Some system designs implement a directory-based coherency protocol. 


For such systems, the processor provides the processor eliminate request cycle. If the 
PrcEImReq mode bit is asserted, the processor issues a processor eliminate request 
whenever it intends to eliminate a Shared, CleanExclusive, or DirtyExclusive block from 
the secondary cache. During the address cycle of the processor eliminate request, the 
physical address and the secondary cache block former state are provided. The external 
agent may then use this information to maintain an external directory structure. 


6.23 Support for Uncached Attribute 
The processor supports a 2-bit user-defined Uncached Attribute, which is driven on 
SysAD[59:58] during the address cycle of the following: 
¢ processor double/single/partial-word read requests 
¢ double/single/partial-word write requests 


e block write requests resulting from completely gathered uncached accelerated 
blocks 


For unmapped accesses, the uncached attribute is sourced from VA[58:57]. 


For mapped accesses, the uncached attribute is sourced from the TLB Uncached Attribute 
field. The TLB Uncached Attribute field may be initialized in 64-bit mode using bits 63:62 
of the CPO EntryLo0 and EntryLo/ registers. 
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6.24 Support for Hardware Emulation 


When using the R10000 processor in hardware emulation, it is desirable to operate the 
System interface at a relative low frequency (typically 1 MHz or below). Since the R10000 
processor contains dynamic circuitry, an external agent cannot simply provide low 
frequency SysClk, so a SysCyc* input to the processor allows an external agent to define 
a virtual system clock, and yet supply a SysClk within the acceptable operating range. The 
assertion of SysCyc* in a particular SysClk cycle creates a virtual system clock pulse four 
SysClk cycles later. SysCyc* may be asserted aperiodically. 


In a normal system environment, the SysCyc* input should be permanently asserted. 


Figure 6-29 depicts the use of SysCyc* to create a virtual SysClk of one-third the normal 
SysClk frequency. 
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Figure 6-29 Hardware Emulation Protocol 
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7. Clock Signals 


The R10000 processor has differential PECL clock inputs, SysClk and SysClk*, from 
which all processor internal clock signals and secondary cache clock signals are derived. 


Three major clock domains are in the processor: 


e the System interface clock domain, which operates at the system clock 
frequency and controls the System interface signals 


* the internal processor clock domain, which controls the processor core logic 


* the secondary cache clock domain, which controls signals communicating 
with the external secondary cache synchronous SRAM 


These domains are described in this chapter. 
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Chapter 7 Clock Signals 


7.1 System Interface Clock and Internal Processor Clock Domains 
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In high performance systems, PECL-level differential clocks are routinely used to 
minimize system clock skews. The R10000 processor receives differential system clock 
signals at the SysClIk and SysClk* pins; two additional pins, SysClkRet and SysClkRet*, 
are the return paths for termination of these signals. 


SysClk and SysClk* are used to drive an on-chip phase-locked loop (PLL), which 
multiplies the system clock to create an internal processor clock, PCIk. 


The R10000 processor always communicates with the system at the SysClk frequency, and 
PCIk always runs at a frequency-multiple of SysClk, according to the following formula: 


PC1lk = SysClk* (SysClkDiv+1) /2 


For example, in a 50 MHz system with SysClkDiv = 7 and SCCIkDiv=2, 
PCIk= 50*8/2 = 200 MHz. 


NOTE: It is preferred that the R10000 processor uses a differential PECL clock input. 
However, in a less-aggressive system, a CMOS/TTL single-ended clock can be used 
to drive the processor, provided its complementary clock input, SysClk*, is tied to an 
appropriate reference voltage (1.4V for TTL, Vcc/2 for CMOS). In any case, the 
reference voltage applied to SysClk* should not be less than 1.2V. 
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7.2 Secondary Cache Clock 


Errata 


The processor uses registered synchronous SRAMs for its secondary cache, to allow 
pipelined accesses. 


The processor provides 6 pairs of differential clock outputs, SCClk(5:0) and SCCIk*(5:0), 
to be used by the secondary cache synchronous SRAMs. These outputs swing between 
VecQSC and Vss. The SCCIkTap mode bits (Mode bits are described in Chapter 8, the 
section titled “Mode Bits”) specify the alignment of SCC1k(5:0) and SCCIk*(5:0) relative 
to the internal secondary cache clock. Note that the output buffer delay is not included. 


The secondary cache interface clock is generated by dividing down the internal processor 
clock, PCIk. 


SCCIk is related to SysClk according to the following formula: 
SCClk = SysClk* (SysC1kDiv+1) / (SCC1kDiv+1) 


For example, in a 50 MHz system with SysCIkDiv=7 and SCCIkDiv=2, 
SCClk = 50*8/3 = 133 MHz. 
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7.3 Phase-Locked-Loop 
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Chapter 7 Clock Signals 


The processor uses the internal PLL for clock generation and multiplication as shown in 
Figure 7-1. 


Values of the termination resistors for the SysClkRet/SysClkRet* signals are system- 
dependent. The system designer must select a value based upon the characteristic 
impedance of the board, therefore it is beyond the scope of this manual to specify values 
for these termination resistors. 
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Figure 7-1 R10000 System and Secondary Cache Clock Interface 


SRAM 


8. 


Initialization 


This section describes initialization of the R10000 processor, including initialization of 
logical registers. 


Initialization of the processor occurs during a reset sequence. The processor supports three 
separate reset sequences: 


¢ Power-on reset 
¢ Cold reset 


¢ Soft reset 
These sequences are described in this chapter. 


Also described are the mode bits. 
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8.1 Initialization of Logical Registers 


After a power-on or cold reset sequence, all logical registers (both in the integer and the 
floating-point register files) must be written before they can be read. Failure to write any 
of these registers before reading from them will have an unpredictable result. 


8.2 Power-On Reset Sequence 
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The Power-on Reset sequence is used to reset the processor after the initial power-on, or 
whenever power or SysClk are interrupted. 


The Power-on Reset sequence is as follows: 


The external agent negates DCOk. 

The external agent asserts SysReset*. 
The external agent negates SysGnt*. 
The external agent negates SysResp Val”. 


Once Vee, VecQ[SC,Sys], Vref[SC,Sys], Vec[Pa,Pd], and SysClk stabilize, 
the external agent waits at least 1ms and then asserts DCOk. 


At this time, the System interface resides in slave state and all internal state is 
initialized. 

The SysCIkDiv mode bits default to divide-by-1. 

The SCCIkDiv mode bits default to divide-by-3. 


After waiting at least 100 ms for the internal clocks to stabilize, the external 
agent loads the mode bits into the processor by driving the mode bits on 
SysAD[63:0], waiting at least two SysClk cycles, and then asserting SysGnt* 
for at least one SysClk cycle. 


After waiting at least another 100 ms for the internal clocks to restabilize, the 
external agent synchronizes all clocks internal to the processor. This is 
performed by asserting SysRespVal* for one SysClk cycle. 


After waiting at least 100 ms for the internal clocks to again restabilize, (a 
third 100 ms restabilization period) the external agent negates SysReset*. 


The external agent must retain mastership of the System interface, refrain 
from issuing external requests or nonmaskable interrupts, and ignore the 
system state bus until the processor asserts SysReq*. The assertion of 
SysReq* indicates the processor is ready for operation. In a cluster 
arrangement, all processors must assert SysReq*, indicating they are ready for 
operation. 
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Errata 

If the virtual SysClk is used during the reset sequence, the mode bits, SysGnt*, 
SysRespVal*, and SysReset* should all be referenced to the virtual SysClk that is created 
with SysCyc*. This approach will cause the R10000 to come out of reset synchronously 
with the virtual SysClk, which will allow repeatable and lock-step operation (see Chapter 
6, the section titled “Support for Hardware Emulation,” for description of virtual SysClk 
operation). 
During a Power-on Reset sequence, all internal state is initialized. A Power-on Reset 
sequence causes the processor to start with the Reset exception. 
Figure 8-1 shows the Power-on Reset sequence. 
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Figure 8-1 Power-On Reset Sequence 
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8.3 Cold Reset Sequence 


The Cold Reset sequence is used to reset the entire processor, and possibly alter the mode 
bits while power and SysClk are stable. 


The Cold Reset sequence is as follows: 
e The external agent negates SysGnt* and SysRespVal*. 
¢ After waiting at least one SysClk cycle, the external agent asserts SysReset*. 


¢ After waiting at least 100 ms, the external agent loads the mode bits into 
R10000. This is performed by driving the mode bits on SysAD[63:0], waiting 
at least two SysClk cycles, and then asserting SysGnt* for at least one 
SysClk cycle. 


¢ After waiting at least another 100 ms for the internal clocks to restabilize, the 
external agent synchronizes all processor internal clocks by asserting 
SysRespVal* for one SysClk cycle. 


¢ After waiting at least 100 ms for the internal clocks to again restabilize, (a 
third 100 ms restabilization period) the external agent negates SysReset*. 


¢ The external agent must retain mastership of the System interface, refrain 
from issuing external requests or nonmaskable interrupts, and ignore the 
system state bus until the processor asserts SysReq*. The assertion of 
SysReq* indicates the processor is ready for operation. In a cluster 
arrangement, all processors must assert SysReq*, indicating they are ready for 
operation. 


During a Cold Reset sequence all processor internal state is initialized. A Cold Reset 
sequence causes the processor to start with a Reset exception. 


Figure 8-2 shows the cold reset sequence. 
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Figure 8-2 Cold Reset Sequence 
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8.4 Soft Reset Sequence 


A Soft Reset sequence is used to reset the external interface of the processor without 
altering the mode bits while power and SysClk are stable. 


The Soft Reset sequence is as follows: 
e The external agent negates SysGnt* and SysRespVal*. 


¢ After waiting at least one SysClk cycle, the external agent asserts SysReset* 
for at least 16 SysClk cycles. 


¢ The external agent must retain mastership of the System interface, refrain 
from issuing external requests or nonmaskable interrupts, and ignore system 
state bus until the processor asserts SysReq*. The assertion of SysReq* 
indicates the processor is ready for operation. In a cluster arrangement, all 
processors must assert SysReq*, indicating they are ready for operation. 


During a Soft Reset sequence, all external interface state is initialized. The internal and 
secondary cache clocks are not affected by a Soft Reset sequence. The general purpose, 
CPO, and CP1 registers are preserved, as well as the primary and secondary caches. 


A Soft Reset sequence causes a Soft Reset exception, in which the Soft Reset exception 
handler executes instructions from uncached space and uses CACHE instructions to 
analyze and dump the contents of the primary and secondary caches. To resume normal 
operation, a Cold Reset sequence must be initiated. 


Figure 8-3 presents the Soft Reset sequence. 
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Figure 8-3 Soft Reset Sequence 
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8.5 Mode Bits 


The R10000 processor uses mode bits to configure the operation of the microprocessor. 
These mode bits are loaded into the processor from the SysAD[63:0] bus during a power- 
on or cold reset sequence while SysGnt* is asserted. The SysADChk[7:0] bus does not 
have to contain correct ECC during mode bit initialization. During the reset sequence, the 
mode bits obtained from SysAD[24:0] are written into bits 24:0 of the CPO Config register. 


The mode bits are described in Table 8-1. 


coherent requests issued on the System 
interface by the processor. 


PrcEImReq 


Table 8-1 Mode Bits 
Mode Setting 
SysAD Bit Name and Function Value 
R10000 R12000 
0 Reserved 
1 Reserved 
2 Uncached 
7-0 Kseg0CA 3 Cacheable noncoherent 
, Specifies the kseg0 cache algorithm. 4 Cacheable coherent exclusive 
5 Cacheable coherent exclusive on write 
6 Reserved 
7 Uncached accelerated 
Te a leach 0-3 
Specifies the processor device number. 
enna et of processor y Bx landaecnronly 
5 P g P 1 Broadcast 


6 Specifies whether to enable processor 0 Disable 
eliminate requests onto the System 1 Enable 
interface by the processor. 
PrcReqMax ; 
aS F 0 1 outstanding processor request 
Specifies the maximum number of : 
. 1 2 outstanding processor requests 
8:7 outstanding processor requests allowed ; 
; 2 3 outstanding processor requests 
on the System interface by the : 
3 4 outstanding processor requests 


processor. 
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Table 8-1 (cont.) Mode Bits 


Mode Setting 
SysAD Bit Name and Function Value 
R10000 R12000 
0 Reserved Reserved 
1 Result of division by | NOT AVAILABLE 
2 Result of division by 1.5 | NOT AVAILABLE 
3 Result of division by 2 Result of division by 2 
4 Result of division by 2.5 | Result of division by 2.5 
SysClkDiv 2 Result of division by 3 Result of division by 3 
Sets PCIk to SysClk ratio; determines the 6 Result of division by 3.5 | Result of division by 3.5 
12:9 System interface clock frequency; see 7 Result of division by 4 Result of division by 4 
, Chapter 7, the section titled “System 8 Reserved Result of division by 4.5 
Interface Clock and Internal Processor 9 Reserved Result of division by 5 
Clock Domains A Reserved Result of division by 5.5 
B Reserved Result of division by 6 
C Reserved Result of division by 7 
D Reserved Result of division by 8 
E Reserved Result of division by 9 
F Reserved Result of division by 10 
B SCBIkSize 0 16-word 
Specifies the secondary cache block size. 1 32-word 
Soon ‘ 0 Retry access through corrector 
14 Specifies the method of correcting 1 ‘Always acess throuph comectot 
secondary cache data array ECC errors. 
15 MemEnd 0 Little endian 
Specifies the memory system endianness. 1 Big endian 
0 512 Kbyte 
1 1 Mbyte 
2 2 Mbyte 
18:16 SCSize 3 4 Mbyte 
. Specifies the size of the secondary cache. 4 8 Mbyte 
5 16 Mbyte 
6 Reserved 
7 Reserved 
0 Reserved Reserved 
SCCIkDiv 1 Result of division by | NOT AVAILABLE 
Sets PCIk to SCCIk ratio; determines the 2 Result of division by 1.5 | Result of division by 1.5 
1:19 secondary cache clock frequency; see 3 Result of division by 2 Result of division by 2 
: Chapter 7, the section titled “System 4 Result of division by 2.5 Result of division by 2.5 
Interface Clock and Internal Processor Pe) Result of division by 3 Result of division by 3 
Clock Domains 6 Reserved Reserved 
7 Reserved Result of division by 4 
0 Reserved 
1 Reserved 
2 Reserved 
3 Reserved 
24:22 Reserved 4 Delay Speculative Dirty - 


nN 


fix for speculative store! 


Reserved 
Reserved 
Reserved 
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Table 8-1 (cont.) Mode Bits 


Mode Setting 
SysAD Bit Name and Function Value 
R10000 R12000 
0 SCCIk same phase as internal clock 
1 SCCIk 1/12 PCIk period earlier than internal clock 
2 SCCIk 2/12 PCIk period earlier than internal clock 
3 SCCIk 3/12 PCIk period earlier than internal clock 
4 SCCIk 4/12 PCIk period earlier than internal clock 
5) SCCIk 5/12 PCIk period earlier than internal clock 
SCCIkTap 6 undefined 
28-5 Specifies the alignment! of SCCIk[5:0]_ 7 undefined 
: and SCCIk*[5:0] relative to the internal 8 SCClk 6/12 PCIk period earlier than internal clock 
secondary cache clock. 9 SCClk 7/12 PCIk period earlier than internal clock 
A SCCIk 8/12 PCIk period earlier than internal clock 
B SCCIk 9/12 PCIk period earlier than internal clock 
C SCCIk 10/12 PCIk period earlier than internal clock 
D SCCIk 11/12 PCIk period earlier than internal clock 
E undefined 
F undefined 
29 Reserved 0 
ODrainSys 
30 Specifies whether or not to configure 0 Push-pull 
select* System interface bidirectional and 1 Open drain 
output signals as open drain. 
CTM : 
31 Specifies whether or not to enable cache Disile 
1 Enable 
test mode. 
63:32 Reserved 0 


t The Boot Mode bit 24 corresponds to the Config register[24] bit and this controls DSD during kernel and supervisor modes. However, the DSD mode can 
also be enabled in the user mode by setting the Status register[24] bit. Config register[24] is read-only and can be set only at boot time. 


If the DSD mode is set — 


a) R12000 will not set the Dirty bit for a secondary cache block until the store instruction is the oldest in the Active List and is about to be executed. (An 
interrupt could cause a case where the dirty bit is set (store is no longer speculative), but the store does not immediately graduate. We believe this case 
should not cause any problem. This mode does prevent speculative stores from setting the dirty bit.) 


b 


eS 


This mode will have slightly lower performance due to the delay in the setting of the Dirty bit. This delay will occur just once per block refill from main 
memory, when it is necessary to set the dirty bit. Setting the bit requires about ten cycles; but usually the processor will continue to overlap execution of 
other instructions. Once a block becomes dirty in secondary cache, this mode has no performance effect. 


c) In this mode, a miss in secondary cache, due to a store instruction which is not already the oldest in the pipeline, will cause a refill to the “clean exclusive” 
state. A hit to a shared line will immediately cause an upgrade to “clean exclusive”. Thus, bus operations (which are relatively slow) will still begin 
speculatively. 


Independent of the DSD mode, R12000 will delay a “cached, non-coherent” load until it is the oldest instruction. This change is implemented because a 
speculative load accessing an unmapped “xkphys” address as “cached, non-coherent” might bring data into the secondary cache without the proper 
coherency checks. 


R12000 is doing no changes to prevent it from speculatively refilling cache lines in shared or clean states except the “xkphys” case described above. 


+t Does not include the output buffer delay. 
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* SysReq*, SysRel*, SysCmd[11:0], SysCmdPar, SysAD[63:0], SysADChk[7:0], SysVal*, SysState[2:0], SysStatePar, SysStateVal*, 
SysCorErr*, SysUncErr* 


Errata 


The description of bits 28:25 of Table 8-1 has been revised. 
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9. 


Error Protection and Handling 


This chapter presents the error protection and handling features provided by the R10000 
processor. 


Two types of errors can occur in an R10000 system: 
* correctable 


* —uncorrectable 


The following two sections describe them. 
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9.1 Correctable Errors 
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Correctable errors consist of: 
* — secondary cache tag array correctable ECC errors 
* secondary cache data array correctable ECC errors 


e System interface address/data bus correctable ECC errors 


When the processor detects a correctable error, the error is automatically corrected, and 
normal operation continues. Secondary cache array scrubbing is not performed. 


The processor informs the external agent that a correctable error was detected and then 
corrected by asserting the SysCorErr* signal for one SysClk cycle. 
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9.2 Uncorrectable Errors 


Uncorrectable errors consist of: 
e Primary instruction cache array parity errors 
e Primary data cache array parity errors 
¢ Secondary cache tag array uncorrectable ECC errors 
e Secondary cache data array uncorrectable ECC errors 
e System interface command bus parity errors 
¢ System interface address/data bus uncorrectable ECC errors 


¢ System interface response bus parity errors 


Errata 


When the processor detects an uncorrectable error, a Cache Error exception is posted. In 
general, the detection of an uncorrectable error does not disrupt any ongoing operations. 
However, the instruction fetch and load/store units never use data which contains an 
uncorrectable error. 


To inform the external agent, the processor asserts SysUncErr* for one SysClk cycle 
whenever any of the following uncorrectable errors are detected: 


e Primary instruction cache tag array parity errors 


e¢ Primary data cache tag array parity errors 


¢ Secondary cache tag array uncorrectable ECC errors 


¢ System interface command bus parity errors 


System interface address/data bus external address cycle uncorrectable ECC 


errors 


¢ System interface response bus parity errors. 


The processor informs the external agent that an uncorrectable tag error has been detected 
by asserting SysUncErr* for one SysClk cycle. 
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9.3 Propagation of Uncorrectable Errors 
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The processor assists the external agent in limiting the propagation of uncorrectable errors 
in the following manner: 


During external block data response cycles, if the data quality indication on 
SysCmd(5) is asserted, or if an uncorrectable ECC error is encountered on the 
system address/data bus while the ECC check indication on SysCmd(0) is 
asserted, the processor intentionally corrupts the ECC of the corresponding 
secondary cache quadword after receiving an external ACK completion 
response. 


During processor data cycles, the processor asserts the data quality indication 
on SysCmd(5) if the data is known to contain uncorrectable errors. The 
System interface ECC is never intentionally corrupted; the SysCmd(5) bit is 
used to indicate corrupted data. 


If an uncorrectable cache tag error is detected, the processor asserts 
SysUncErr* for one SysClk cycle. 


An external coherency request that detects a secondary cache tag array 
uncorrectable error asserts the secondary cache block tag quality indication on 
SysState(2) during the corresponding processor coherency state response. 


If an external coherency request requires a processor coherency data response, 
and a primary data cache tag parity error is encountered during the primary 
cache interrogation, or a secondary cache tag array uncorrectable error is 
encountered during the secondary cache interrogation, the processor asserts 
the data quality indication on SysCmd(5) for all doublewords of the 
corresponding processor coherency data response. 
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9.4 Cache Error Exception 


The processor indicates an uncorrectable error has occurred by asserting a Cache Error 
exception. 


The following four internal units detect and report uncorrectable errors: 
* — instruction cache 
¢ data cache 
e — secondary cache 


¢ System interface 
Each of these four units maintains a unique local CacheErr register. 


A Cache Error exception is imprecise; that is, it is not associated with a particular 
instruction. When any of the four units post a Cache Error exception, completed 
instructions are graduated before the Cache Error exception is taken. If there are Cache 
Error exceptions posted from more than one of the units, the exceptions are prioritized in 
the following order: 


1. instruction cache 
2. data cache 

3. secondary cache 

4. System interface. 


The corresponding local CacheErr register is transferred to the CPO CacheErr register and 
the CPO Status register ERL bit is asserted. Instruction fetching begins from 0xa0000100 
or Oxbfc00300, depending on the CPO Status register BEV bit. The CPO ErrorEPC register 
is loaded with the virtual address of the next instruction that has not been graduated, so that 
execution can resume after the Cache Error exception handler completes. 


When ERL=1, the user address region becomes a 2-Gbyte uncached space mapped directly 
to the physical addresses. This allows the Cache Error handler to save registers directly to 
memory without having to use a register to construct the address. 


The processor does not support nested Cache Error exception handling. While the CPO 
Status register ERL bit is asserted, any subsequent Cache Error exceptions are ignored. 
However, the detection of additional uncorrectable errors is not inhibited, and additional 
Cache Error exceptions may be posted.* 


+ The hardware does not handle the case of multiple Cache Error exceptions in any special 
manner; caches are refilled as normal, and data forwarded to the appropriate functional units. 
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9.5 CP0 CacheErr Register EW Bit 


When a unit detects an uncorrectable error, it records information about the error in its local 
CacheErr register and posts a Cache Error exception. If a subsequent uncorrectable error 
occurs while waiting for the Cache Error exception to be taken and transfer of the local 
CacheErr register to the CPO CacheErr register to complete, the EW bit is set in its local 
CacheErr register. Once the Cache Error exception is taken, the EW bit in the CPO 
CacheErr register is set and the Cache Error exception handler now determines that a 
second error has occurred. 


Once the CPO CacheErr register EW bit is set, it can only be cleared by a reset sequence. 


9.6 CPO Status Register DE Bit 


Asserting the CPO Status register DE bit suppresses the posting of future Cache Error 
exceptions. All local CacheErr registers are also prevented from being updated. Unlike the 
R4400 processor architecture, when the DE bit is asserted, cache hits are not inhibited when 
an uncorrectable error is detected. Correctable errors are handled normally when the DE 
bit is set. 


NOTE: Be careful when setting this bit, since it may cause erroneous data and/or 
instructions to be propagated. 


9.7 CACHE Instruction 
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Uncorrectable error protection is suppressed for the Index Load Tag, Index Store Tag, Index 
Load Data, and Index Store Data CACHE instruction variations. These four variations may 
be used within a Cache Error exception handler to examine the cache tags and data without 
the occurrence of further uncorrectable errors. 
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9.8 Error Protection Schemes Used by R10000 


Error protection schemes used in the R10000 processor are: 
* parity 
* sparse encoding 


« ECC 


These schemes are described in this section, and listed in Table 9-1. 


Table 9-1 Error Protection Schemes Used in the R10000 Processor 


Error Detection Used What is Protected 


Primary caches 
Parity Secondary cache data 
System interface buses 


Sparse encoding Primary data cache state mod array 


Secondary cache tag 
ECC (SECDED) Secondary cache data 
System interface address/data bus 


Parity 
Parity is used to protect the primary caches and various System interface buses. The 
processor uses both odd and even parity schemes: 


* in an odd parity scheme, the total number of ones on the protected data and 
the corresponding parity bit should be odd 


* in an even parity scheme, the total number of ones on the protected data and 
the corresponding parity bit should be even. 


Sparse Encoding 


A sparse encoding is used to protect the primary data cache state mod array. In such a 
scheme, valid encodings are chosen so that altering a single bit creates an invalid encoding. 


ECC 


An error correcting code (ECC) is used to protect the secondary cache tag, the secondary 
cache data, and the System interface address/data bus. A distinct single-bit error correction 
and double-bit error detection (SECDED) code is used for each of these three applications. 
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9.9 Primary Instruction Cache Error Protection and Handling 


Error Protection 


Error Handling 
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This section describes error protection and error handling schemes for the primary 
instruction cache. 


The primary instruction cache arrays have the following error protection schemes, as listed 
in Table 9-2. 


Table 9-2 Primary Instruction Cache Array Error Protection 


Array Width Error Protection 
Tag Address 27-bit Even parity 
Tag State 1-bit Even parity 
Data 36-bit Even parity 


All primary instruction cache errors are uncorrectable. If an error is detected, the 
instruction cache unit posts a Cache Error exception and initializes the D, TA, TS, and PIdx 
fields in the local CacheErr register (see Chapter 14, CacheErr Register (27), for more 
information). If an error is detected on the tag address or state array, the processor informs 
the external agent that an uncorrectable tag error was detected by asserting SysUncErr* for 
one SysClk cycle. 
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9.10 Primary Data Cache Error Protection and Handling 


Error Protection 


Error Handling 


This section describes error protection and error handling schemes for the primary data 


cache. 


The primary data cache arrays have the following error protection schemes, as listed in 


Table 9-3. 


Table 9-3 Primary Data Cache Array Error Protection 


Array Width Error Protection 
Tag Address 28-bit Even parity 
Tag State 3-bit Even parity 
Tag Mod 3-bit Sparse encoding 
Data 8-bit Even parity 
LRU 1-bit None 


All primary data cache errors are uncorrectable. If an error is detected, the data cache unit 
posts a Cache Error exception and initializes the EE, D, TA, TS, TM, and Pldx fields in the 
local CacheErr register (see Chapter 14, CacheErr Register (27), for more information). If 
an error is detected on the tag address, state, or mod array, the processor informs the 
external agent that an uncorrectable tag error was detected by asserting SysUncErr* for 


one SysClk cycle. 
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9.11 Secondary Cache Error Protection and Handling 


This section describes error protection and error handling schemes for the secondary cache. 


Error Protection 


The secondary cache arrays have the following error protection schemes, as listed in Table 
9-4. 


Table 9-4 Secondary Cache Array Error Protection 


Array Width Error Protection 
Data 128-bit 9-bit ECC + even parity 
Tag 26-bit 7-bit ECC 
MRU (Way prediction table) 1-bit None 
Error Handling 
This section describes error handling for the data array and the tag array. As shown in Table 
9-4, errors are not detected for the way prediction table. 
Data Array 
Errata 


The 128-bit wide secondary cache data array is protected by a 9-bit wide ECC. An even 
parity bit for the 128 bits of data is used for rapid detection of correctable (single-bit) errors; 
when a correctable parity error is detected, the data is sent through the data corrector. The 
parity bit does not have any logical effect on the processor’s ability to either detect or 
correct errors. 


Whenever the processor writes the secondary cache data array, it drives the proper ECC on 
SCDataChk(8:0) and even parity on SCDataChk(9). 
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Data Array in Correction Mode 


The secondary cache operates in correction mode when the SCCorEn mode bit is asserted. 
Whenever the processor reads the secondary cache data array in correction mode, the data 
is sent through a data corrector. 


If a correctable error is detected, in-line correction is automatically made without affecting 
latency. The processor informs the external agent that a correctable error was detected and 
corrected by asserting SysCorErr* for one SysClk cycle. 


If an uncorrectable error is detected, the secondary cache unit posts a Cache Error exception 
and initializes the D and S/dx fields in the local CacheErr register (see Chapter 14, 
CacheErr Register (27), for more information). 


In correction mode, secondary-to-primary cache refill latency is increased by two PCIk 
cycles. Multiple processors, operating in a lock-step fashion, remain synchronized in the 
presence of secondary cache data array correctable errors. 


Table 9-5 presents the ECC matrix for the secondary cache data array. 
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COT 


Check Bit 8) 7654/3210 
Data Bit 4499)4149]1119)9999)4119]1111]1111 
2222|2222)1111]1111]1100]0000] 0000] 9999] 9999] 9988] 8888] 8888] 7777|7777| 7766] 6666] 6666] 5555] 5555] 5544] 4444] 4444) 3333) 3333] 3322|2222|22201111141111) 11 
7654] 3210] 98765432] 1098] 7654] 3210] 9876] 5432] 1098] 7654] 3210] 9876] 5432] 1098] 7654] 3210] 9876] 5432] 1098] 7654) 3210] 9876] 5432] 1098] 7654] 3210] 9876] 5432] 1098] 7654) 3210) 
54 | 1/0000} 0000} 1111) 1111/0000/001 1) 1111/1111) 1110/0014 1/0000) 01 10) 1111) 1111/0000) 01 11)0010}0001}01 10/0010} 1101/0000} 101 1/0000} 1 101}0000! 0000} 0000} 0010} 0000} 0010} 0000} 1000) 0000 
53 |O} 1000) 0000} 1111) 14111)1111)1111]0000/ 001 1/0000} 1141) 1111) 1111/0000) 01 11]0000) 001 1] 101 1}0010)0001 | 0000} 0000] 1000} 0100} 1000) 1 100} 1000] 0000} 0010} 1001) 0000} 1001] 0000) 0010/0000 
54 |0}0100/ 0000} 1000/0011) 1111)1111)1111)1111/0000)001 1/0000/0101}0000/001 1} 1111} 1111/0101} 1100} 1000} 0000} 1000}0100}0000} 0100} 1100/0100} 001 1} 1000/0100} 1000/0100} 1000}0101}0000 
Number of ones |53 |0}0010/ 0000} 0100) 0010} 1000} 0000) 1000} 0000] 0001} 0010} 1111) 1100} 1141] 1011) 1111} 1100/0011] 11141) 1111) 1101]01 10} 00100000} 0010) 0010} 0010} 1000] 0101}0000} 0100) 1 100}0100) 0000} 1000 
per row 53 |0/ 0001/0000} 0010} 0001}0100] 0000} 0100] 0011] 1111] 1111] 1000] 0000} 1000] 0000] 1000} 0100) 1111] 1111] 1111] 1111]0010} 0001} 0000] 0001] 0000] 0001] 1111] 1111] 1100] 0010} 0000} 0010} 1000} 0100) 
53 |0}0000} 1000] 0001 |0000} 0010] 001 1 |0010} 0000} 1000) 0001}0100)0100)0100} 0000) 0100) 0110} 1011) 1111) 1111) 1100) 0011) 1111} 1101) 1111/0011] 1111/0100) 1100} 0000) 0001} 0000] 0001) 0100} 0010) 
54 | 0/0000 0100} 0000} 1010} 0001} 0010/0001] 0010} 0001] 1000} 0010) 001 1] 0010} 0000) 0010} 0001} 0000} 0000} 101 1] 1010] 1111) 1111) 1100} 0000} 1010] 0000} 1101) 0000} 11414) 1111) 1111] 1111] 1100] 0001 
53 | 0/0000) 0010} 0000} 0100} 0000} 1001] 0000} 1001] 0100} 0100} 0001] 001 1] 0001] 0010] 0001} 0000} 0000} 1000} 0100} 1101] 1100} OOOO} 1 110] OOOO} 1111] 1111] 1110] 0000] 1100] 0000} 11141] 111141111]1111 
54 | 0/0000} 0001} 0000} 0001} 0000} 0100) 0000} 0100} 001 0) 0000} 0000} 101 1] 0000} 1 101) O00} 101 1) 0100) 01 11} 0000] 0100] 1110} 0000) 11411] 1111/0110) 0000} 1100/0011] 1111) 1111) 1100}0000) 1111] 1111 
Number of ones 1) 1111] 1111]3333) 3355) 3333) 3355) 3333) 3355) 3333) 3355) 3333) 3555] 3333) 3355) 3333) 3555] 3355) 5555) 5555] 5533] 5553) 3333) 5533) 3333) 5553) 3333) 5533) 3333) 5533) 3333) 5533) 3333) 5533) 3333 
per column 
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Data Array in Noncorrection Mode 


Tag Array 


When the SCCorEn mode bit is negated, the secondary cache operates in noncorrection 
mode. Whenever the processor reads the secondary cache data array in noncorrection 
mode, it checks for even parity on SCDataChk(9). If a parity error is detected, it is 
assumed that a correctable error has occurred, and the secondary cache block is again read 
through a data corrector. During this re-read, the processor checks the SCDataChk(8:0) 
bus for the proper ECC. 


If a correctable error is detected, correction is automatically performed in-line. To inform 
the external agent that a correctable error had been detected and corrected, the processor 
asserts SysCorErr* for one SysClk cycle. 


If an uncorrectable error is detected, the secondary cache unit posts a Cache Error exception 
and initializes the D and S/dx fields in the local CacheErr register. 


Secondary cache data array correctable errors are monitored with Performance Counter 0. 


The 26-bit-wide secondary cache tag array is protected by a 7-bit-wide ECC. 
Table 9-6 presents the ECC matrix for the secondary cache tag array. 


Table 9-6 ECC Matrix for Secondary Cache Tag Array 


Check Bit od 1a 34 56 
Data Bit pboo bo fa fa faaaaht 
5432 10 bs 76 |54321098/7654/3210 
11}0(0100|1000|1000/0001|11 1 1]1000|1000|1 000 
13]0|1000|0100|0100/0010/1111/1114/0000|0100 
11]1]0010|1000|0001|1000|0000/1111{0100]0010 
Number of [7 414101000100/001010100|1000/0100/1114|0000 
pnes Per TOW 11 31011000|0001/1000]1 000/01 00/0000|1 1114/1111 
12|1/0010|0010|0100/0100|0010/0010\0010|1111 
14l0o|1111]1100}4 1001 1000001/0001|0001}0001 
Number of ones |3/3331|331 11331 1/331 1|3333/3333/3333|3333 
per column 


Whenever the processor reads the secondary cache tag array, it checks the SCTagChk(6:0) 
bus for the proper ECC. Ifa correctable error is detected, correction is automatically 
performed in-line, without affecting latency. The processor asserts SysCorErr* for one 
SysClk cycle to inform the external agent that a correctable error has been detected and 
corrected. If an uncorrectable error is detected, the secondary cache unit posts a Cache 
Error exception and initializes the TA and S/dx fields in the local CacheErr register. The 
processor asserts SysUncErr* for one SysClk cycle to inform the external agent that an 
uncorrectable tag error has been detected. 


Whenever the processor writes the secondary cache tag array, it drives the proper ECC on 
the SCTagChk(6:0) bus. 
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9.12 System Interface Error Protection and Handling 


This section describes error protection and error handling schemes for the System interface. 


Error Protection 


The System interface buses have the following error protection schemes, as listed in Table 


9-7. 


Table 9-7 System Interface Bus Error Protection 
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Bus Width Error Protection 
SysCmd 12-bit Odd parity 
SysAD 64-bit 8-bit ECC 
SysState 3-bit Odd parity 
SysResp 5-bit Odd parity 


Error Handling 


SysCmd(11:0) Bus 


Errata 
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This section describes error handling on the system command bus, system address/data bus, 
system state bus, and system response bus. 


The 12-bit wide system command bus, SysCmd(11:0), is protected by odd parity. 


Whenever the processor is in master state and it asserts Sys Val* to indicate that it is driving 
valid information on the SysCmd(11:0) bus, it also drives odd parity on the SysCmdPar 
signal. 


Whenever the processor is in slave state and an external agent asserts SysVal* to indicate 
that it is driving valid information on the SysCmd(11:0) bus, the processor checks the 
SysCmdPar signal for odd parity. If a parity error is detected, the processor ignores the 
SysCmd(11:0) and SysAD(63:0) buses for one SysClk cycle. The System interface unit 
posts a Cache Error exception and sets the SC bit in the local CacheErr register. 
Additionally, the processor informs the external agent by asserting SysUncErr* for one 


SysClk cycle. 


Caution: By ignoring the SysCmd(11:0) and SysAD(63:0) buses, the processor to 
become unsynchronized with other processors or the external agent on the cluster 
bus. 


205 


Chapter 9 Error Protection and Handling 


SysAD(63:0) Bus 
The 64-bit wide system address/data bus, SysAD(63:0), is protected by an 8-bit-wide ECC. 


Processor in Master State 


Whenever the processor is in master state and it asserts SysVal* to indicate it is driving 
valid information on the SysAD(63:0) bus, it also drives the proper ECC on the 
SysADChk(7:0) bus. 


Processor in Slave State 


Whenever the processor is in slave state, error checking is enabled with the assertion of 
SysCmd(0), and an external agent asserts SysVal* to indicate it is driving valid information 
on the SysAD(63:0) bus, the processor checks the SysADChk(7:0) bus for the proper ECC. 


Correctable Error Detected 


If a correctable error is detected during an external address cycle, or during an external data 
cycle for a processor read or upgrade request originated by the R10000 processor, 
correction is automatically performed in-line without affecting latency. The processor 
asserts SysCorErr* for one SysClk cycle to inform the external agent that a correctable 
error has been detected and corrected. 


Uncorrectable Error Detected 


Errata 
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If an uncorrectable error is detected during an external address cycle, the processor ignores 
the SysCmd(11:0) and SysA D(63:0) buses for one SysClk cycle, and the System interface 
unit posts a Cache Error exception and sets the SA bit in the local CacheErr register. 
Additionally, the processor informs the external agent by asserting SysUncErr* for one 


SysClk cycle. 


Caution: By ignoring the SysCmd(11:0) and SysAD(63:0) buses, this processor 
may become unsynchronized with other processors or the external agent on the 
cluster bus. 


If an uncorrectable error is detected or the data quality indication on SysCmd(5) is asserted 
during an external data cycle for a processor read or upgrade request originated by the 
processor, the R10000 asserts the corresponding incoming buffer uncorrectable error flag. 


When the processor forwards block data from an incoming buffer entry after receiving an 
external ACK completion response, the associated incoming buffer uncorrectable error 
flags are checked, and if any are asserted, the System interface unit posts a single Cache 
Error exception and initializes the EE, D, and SIdx fields in the local CacheErr register. 


When the processor forwards double/single/partial-word data from an incoming buffer 
entry after receiving an external ACK completion response, the associated incoming buffer 
uncorrectable error flag is checked and, if asserted, the System interface unit posts a Bus 
Error exception. 
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Table 9-8 presents the ECC matrix for the System interface address/data bus. This ECC 
matrix is identical to that used by the R4X00 System interface. 


Table 9-8 ECC Matrix for System Interface Address/Data Bus 


\Check Bit 43, 52) 70) 61 


Data Bit 666665 [555555 = |55441444414444/3333/3333}3322/2222/222011 111/1111/11 
321098 |765432 |1098/7654/3210/9876/543211098)7654/3210/9876|5432/10 (987654 3210 


1111]1 100)1 100/14 000)1000/0000/1 1 1 1}1 11 1]0000}1000)1 000)1000/1000/0000)1010/0100)/1000}1000 
1111]1000)1000/1000/0100)0000/0000/0000)1 1 1 1]0100/0100/0100/0100)1111)1100/1 100/1010/0100 
0000}1 000}1 100)1010/0010)1 111)1 11 1/0000/0000)001 0/001 0/0010)0010)1 1 1 1)1000)1000/1 100/0010) 
0000)1010)0100)1 100/0001]1 11 1/0000)1 1 11}1 11 1/0001/0001/0001 0001 |0000}1 000}1 100}1000/0001 


1000}0101)/001 1/0100|0000)1 000}1 000)1000)1 000}1 1 4 1)4 11 1/0000/1 1 1 1)1000)1 100/0001/0100/0000 
0100}1 100/0010/0101)1111]0100!0100/0100}0100j0000/0000/1 1 1 1}1111/0100}0100/001 1/0100}0000 
0010}0100/001 1)1 100/1 1 1 11001 0)0010/0010/0010)1 11 1/0000/0000)0000/0010/0100)0010/0101]1111 
0001}0100/0001}0100)0000/0001/0001/0001}0001}0000)1 1 1 1/1 111]0000)0001}0101/001 1}1100)1111 


Number of ones |3333/551 1/3333/551 1/333313333/3333)/3333/3333/3333]3333/3333/3333/3333)551 1/3333/551 1/3333 
joer column 


INumber of 
jones per row 


NNNDN | NIN NII 
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SysState(2:0) Bus 


SysResp(4:0) Bus 


Errata 
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The 3-bit wide system state bus, SysState(2:0), is protected by odd parity. The processor 
drives odd parity on the SysStatePar signal. 


The 5-bit wide system response bus, SysResp(4:0), is protected by odd parity. 


Whenever an external agent asserts SysResp Val* to indicate it is driving valid information 
on the SysResp(4:0) bus, the processor checks the SysRespPar signal for odd parity. Ifa 
parity error is detected, the processor ignores the SysResp(4:0) bus for one SysClk cycle. 
The System interface unit posts a Cache Error exception and sets the SR bit in the local 
CacheErr register. Additionally, the processor informs the external agent by asserting 
SysUncErr* for one SysClk cycle. 


Caution: If the processor ignores the SysResp(4:0) bus, it may become 
unsynchronized with other processors or the external agent on the cluster bus. Also, 
the processor will “hang” if a parity error is detected on the SysResp[4:0] bus during 
an external completion response cycle for a processor double/single/partial-word 
read request originated by the processor. The external agent may initiate a Soft Reset 
sequence to obtain the contents of the CacheErr register, and the CacheErr register 
will indicate a System interface uncorrectable system response bus error. 
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Protocol Observation 


The processor continuously observes the protocol on the System interface. 
Table 9-9 presents the supported protocol observations and the associated error handling 


sequence. 


Table 9-9 Protocol Observation 


Protocol Observation 


Error Handling 


External response data cycle with an unexpected request number 
during an external block data response for a processor block read 
or upgrade request originated by the processor. 


External block data response specifying a Reserved cache block 
state for a processor block read or upgrade request originated by 
the processor. 


Ignore the external response data cycle 


Override the cache block state to CleanExclusive 


External block data response specifying a Shared cache block 
state for a processor coherent block read exclusive or upgrade 
request originated by the processor. 


Override the cache block state to CleanExclusive 


External completion response specifying a Reserved completion 
indication. 


External ACK completion response for a processor read request 
originated by the processor that has not received an external data 
response. 


Ignore the external completion response 


Override the external ACK completion response to a 
NACK 
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10. JTAG Interface Operation 


Errata 


The JTAG interface is implemented according to the standard IEEE 1149.1 test access port 
protocol specifications. 


The JTAG interface accesses the JTAG controller and instruction register as well as a 
boundary scan register. The JTAG operation does not require DCOK to be asserted or 
SysClk to be running: however, if DCOk is asserted the SysClk must run at the specified 
minimum frequency or the core logic may be damaged. 
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10.1 Test Access Port (TAP) 


TAP Controller (Input) 
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The test access port (TAP) consists of four interface signals. These signals are used to 
control the serial loading and unloading of instructions and test data, as well as to execute 
tests. 


The TAP consists of the following signals: 


JTDI: Serial data input (Input signal) 
JTDO: Serial data output (Output signal) 
JTMS: Mode select (Input signal) 
JTCK: Clock (Input signal) 


The timing and the relationship of the TAP signals follows the IEEE 1149.1 standard 
protocol. 


The R10000 processor implements the 16-state TAP controller specified by the IEEE 
1149.1 standard in the following manner: 


¢ The JTMS signal operates the state machine synchronized by the JTCK 
signal. 


¢ The TAP controller is reset by keeping the JTMS signal asserted through five 
consecutive edges of JTCK. This reset condition sets the reset state of the 
controller. The TAP controller is also reset by asserting SysReset*. This pin 
must not be asserted while using the boundary scan register. 
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10.2 Instruction Register 


10.3 Bypass Register 


The JTAG instruction register is four bits wide, permitting a total of 16 instructions to 
control the selection of the bypass register, the boundary scan register, and other data 
registers. 


The encoding of the instruction register is given in Table 10-1: 


Table 10-1 JTAG Instruction Register Encoding 


MSB...LSB Selected Data Register 
0000 Boundary Scan Register 
0001 Sample - Preload 
0010 

to Data Register (not used) 
1110 
1111 Bypass Register 


The 0001 value is provided to represent sample-preload, but also selects the boundary scan 
register. 


During a reset of the TAP controller, the value 1111 is loaded into the parallel output of the 
instruction register, thus selecting the bypass register as the default. 


During the Shift-IR state of the TAP controller, data is shifted serially into the instruction 
register from JTDI, and the LSB of the instruction register is shifted out onto JTDO. 


During the Update-IR state, the current state of the instruction register is shifted to its 
parallel output for decoding. 


The bypass register is 1 bit wide. 


When the bypass register is selected and the TAP controller is in the Shift-DR state, data on 
JTDI is shifted into the bypass register and the output of the bypass register is shifted out 
onto JTDO. 
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10.4 Boundary Scan Register 
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The bypass register is 1 bit wide. 


The boundary scan data register is selected by loading 0000 into the instruction register. 
The Shift-DR, Update-DR, and Capture-DR states of the TAP controller are used to operate 
the boundary scan register according to the IEEE 1149.1 standard specifications. 


The boundary scan register provides serial access to each of the processor interface pins, as 
shown in Figure 10-1. Hence, the boundary scan register can be used to load and observe 
specific logic values on the processor pins. 


Integrated 
Circuit 


IC package pin 
Boundary scan cells [7] 


Figure 10-1 JTAG Boundary Scan Cells 


The main application of the boundary scan register is board-level interconnect testing. 


The use of the boundary scan register for applying data to and capturing data from the 
internal microprocessor circuitry is not supported. 


The boundary scan register list for rev 1.2 of the fab is given in Table 10-2. The TriState 
signal will be eliminated from the BSR in rev 2.0 of the fab, and beyond. 


An additional bit is provided in the boundary scan register to control the direction of 
bidirectional pins. As it is loaded through JTDI, this bit is the first bit in the boundary scan 
chain. The logic value of this bit is latched during the Update-DR state, and sets the 
direction of all bidirectional pins as follows: 


Value Direction 
0 Input 
1 Output 


The value is set to 0 during reset, setting all bidirectional pins to input prior to any boundary 
scan operations. 


Table 10-2. Boundary Scan Register Pinlist, rev 1.2 
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+ Will be eliminated after rev. 


1.2: 


Signal Signal Signal Signal Signal Signal 
k SCDataChk[1] SCData[63] 3. SCData[62] 4. SCData[61] 5; SCData[60] 6. SCData[59] 
Me SCData[58] ; SCData[57] 9. SCData[56] 10. SCData[55] 11. SCData[54] 12. SCData[53] 
13. SCData[52] 14. SCData[51] 15. SCData[50] 16. SCData[49] 17. SCData[48] 18. SCData[47] 
19. SCData[46] 20. SCData[45] 21. SCData[44] 22. SCData[43] 23. SCData[42] 24. SCData[41] 
25. SCData[40] 26. SCData[39] 27. SCData[38] 28. SCData[37] 29. SCData[36] 30. SCData[35] 
31. SCData[34] 32. SCData[33] 33. SCData[32] 34. SysAD[0] 35. SysAD[1] 36. SysAD[2] 
37. SysAD[3] 38. SysAD[4] 39. SysAD[5] 40. SysAD[6] 41. SysAD[7] 42. SysAD[8] 
43. SysAD[9] 44. SysAD[10] 45. SysAD[11] 46. SysAD[12] 47. SysAD[13] 48. SysAD[14] 
49. SysAD[15] 50. SCData[0] 51. SCData[1] 52. SCData[2] 53. SCData[3] 54. SCData[4] 
55. SCData[5] 56. SCData[6] 57. SCData[7] 58. SCData[8] 59. SCData[9] 60. SCData[10] 
61. SCData[11] 62. SCData[12] 63. SCData[13 64. SCData[14] 65. SCData[15] 66. SCData[16] 
67. SCData[17] 68. SCData[18] 69. SCData[19 70. SCData[20] 71. SCData[21] 72. SCData[22] 
73. SCData[23] 74. SCData[24] 75. SCData[25 76. SCData[26] 71. SCData[27] 78. SCData[28] 
79. SCData[29] 80. SCData[30] 81. SCData[31 82. SCDataChk[0] 83. SCAAddr[18] 84. SCAAddr[17] 
85. SCAAddr[16] 86. SCAAddr[15] 87. SCAAddr[14] 88. SCAAddr[13] 89. SCAAddr[12] 90. SCAAddr[11] 
91. SCAAddr[10] 92. SCAAddr[9] 93. SCDataChk[2] 94. SCDataChk[4] 95. SCData[64] 96. SCData[65] 
97. SCData[66] 98. SCData[67] 99. SCData[68 100. SCData[69] 101. SCData[70] 102. SCData[71] 
103. SCDataChk[9] 104. SysCyc* 105. SysAD[32] 106. SysAD[33] 107. SysAD[34] 108. SysAD[35] 
109. SysAD[36] 110. SysAD[37] 111. SysAD[38] 112. SysAD[39] 113. SysAD[40] 114. SysAD[41] 
115. SysAD[42] 116. SysAD[43] 117. SysAD[44] 118. SysAD[45] 119. SysAD[46] 120. SysAD[47] 
121. SCData[72] 122. SCData[73] 123. SCData[74] 124. SCData[75] 125. SCData[76] 126. SCData[77] 
127. SCData[78] 128. SCData[79] 129. SCAAddr[0] 130. SCAAddr[1] 131. SCAAddr[2] 132. SCAAddr[3] 
133. SCAAddr[4] 134. SCAAddr[5] 135. SCAAddr[6] 136. SCAAddr[7] 137. SCAAddr[8] 138. SCADWay 
139. SCADCS* 140. SCADOE* 141. SCADWr* 142. SCData[80] 143. SCData[81] 144. SCData[82] 
145. SCData[83] 146. SCData[84] 147. SCData[85] 148. SCData[86] 149. SCData[87] 150. SCData[88] 
151. SCData[89] 152. SCData[90] 153. SCData[91] 154. SCData[92] 155. SCData[93] 156. SCData[94] 
157. SCData[95] 158. SCDataChk[6] 159. SCDataChk[8] 160. Sparel 161. SCTCS* 162. SCTOE* 
163. SCTWr* 164. SCTag[25] 165. SCTag[24] 166. SCTag[23] 167. SCTag[22] 168. SCTag[21] 
169. SCTag[20] 170. SCTag[19] 171. SCTag[18] 172. SCTag[17] 173. SCTag[16] 174. SCTag[15] 
175. SCTag[14] 176. SCTag[13] 177. SCTag[12] 178. SCTag[11] 179. SCTag[10] 180. SCTag[9] 
181. SCTag[8] 182. SCTag[7] 183. SCTag[6] 184. SCTag[5] 185. SCTag[4] 186. SCTag[3] 
187. SCTag[2] 188. SCTag[1] 189. SCTag[0] 190. SCTagLSBAddr 191. TriState 192. SCTWay 
193. SCTagChk[6] 194. SCTagChk[5] 195. SCTagChk[4] 196. SCTagChk[3] 197. SCTagChk[2] 198. SCTagChk[1] 
199. SCTagChk[0] 200. SysCmd[0] 201. SysCmd[1] 202. SysCmd[2] 203. SysCmd[3] 204. SysCmd[4] 
205. SysCmd[5] 206. SysCmd[6] 207. SysCmd[7] 208. SysCmd[8] 209. SysCmd[9] 210. SysCmd[10] 
211. SysCmd{[11] 212. SysCmdPar 213. SysVal* 214. SysReq* 215. SysRel* 216. SysGnt* 
217. SysReset* 218. SysRespVal* 219. SysRespPar 220. SysResp[4] 221. SysResp[3] 222. SysResp[2] 
223. SysResp[1] 224. SysResp[0] 225. SysGblPerf* 226. SysRdRdy* 227. SysWrRdy* 228. SysStateVal* 
229. SysStatePar 230. SysState[2] 231. SysState[1] 232. SysState[0] 233. SysCorErr* 234. SysUncErr* 
235. SysNMI* 236. SCDataChk[7] 237. SCDataChk[5] 238. SCData[127] 239. SCData[126] 240. SCData[125] 
241. SCData[124] 242. SCData[123] 243. SCData[122] 244. SCData[121] 245. SCData[120] 246. SCData[119] 
247. SCData[118] 248. SCData[117] 249. SCData[116] 250. SCData[115] 251. SCData[114] 252. SCData[113] 
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Table 10-2 (cont.) Boundary Scan Register Pinlist, rev 1.2 


£ Will be eliminated after rev. 1.2. 
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Signal Signal Signal Signal Signal Signal 
253. SCData[112] 254. SCBDWr* 255. SCBDOE* 256. SCBDCS* 257. SCBDWay 258. SCBAddr[8] 
259. SCBAddr[7] 260. SCBAddr[6] 261. SCBAddr[5] 262. SCBAddr[4] 263. SCBAddr[3] 264. SCBAddr[2] 
265. SCBAddr[1] 266. SCBAddr[0] 267. SCData[111] 268. SCData[110] 269. SCData[109] 270. SCData[108] 
271. SCTag[8] 272. SCTag[7] 273. SCTag[6] 274. SCTag[5] 275. SCTag[4] 276. SCTag[3] 
277. SCTag[2] 278. SCTag[1] 279. SCTag[0] 280. SCTagLSBAddr 281. TriState* 282. SCTWay 
283. SCTagChk[6] 284. SCTagChk[5] 285. SCTagChk[4] 286. SCTagChk[3] 287. SCTagChk[2] 288. SCTagChk[1] 
289. SCTagChk[0] 290. SysCmd[0] 291. SysCmd[1] 292. SysCmd[2] 293. SysCmd[3] 294. SysCmd[4] 
295. SysCmd[5] 296. SysCmd[6] 297. SysCmd[7] 298. SysCmd[8] 299. SysCmd[9] 300. SysCmd[10] 
301. SysCmd[11] 302. SysCmdPar 303. SysVal* 304. SysReq* 305. SysRel* 306. SysGnt* 
307. SysReset* 308. SysRespVal* 309. SysRespPar 310. SysResp[4] 311. SysResp[3] 312. SysResp[2] 
313. SysResp[1] 314. SysResp[0] 315. SysGblPerf* 316. SysRdRdy* 317. SysWrRdy* 318. SysStateVal* 
319. SysStatePar 320. SysState[2] 321. SysState[1] 322. SysState[0] 323. SysCorErr* 324. SysUncErr* 
325. SysNMI* 326. SCDataChk[7] 327. SCDataChk[5] 328. SCData[127 329. SCData[126] 330. SCData[125] 
331. SCData[124] 332. SCData[123] 333. SCData[122] 334. SCData[121 335. SCData[120] 336. SCData[119] 
337. SCData[118] 338. SCData[117] 339. SCData[116] 340. SCData[115 341. SCData[114] 342. SCData[113] 
343. SCData[112] 344. SCBDWr* 345. SCBDOE* 346. SCBDCS* 347. SCBDWay 348. SCBAddr[8] 
349. SCBAddr[7] 350. SCBAddr[6] 351. SCBAddr[5] 352. SCBAddr[4] 353. SCBAddr[3] 354. SCBAddr[2] 
355. SCBAddr[1] 356. SCBAddr[0] 357. SCData[111] 358. SCData[110 359. SCData[109] 360. SCData[108] 
361. SCData[107] 362. SCData[106] 363. SCData[105] 364. SCData[104 365. SysAD[63 366. SysAD[62] 
367. SysAD[61] 368. SysAD[60] 369. SysAD[59] 370. SysAD[58] 371. SysAD[57 372. SysAD[56] 
373. SysAD[55] 374. SysAD[54] 375. SysAD[53] 376. SysAD[52] 377. SysAD[51 378. SysAD[50] 
379. SysAD[49] 380. SysAD[48] 381. SysADChk[7] 382. SysADChk[6] 383. SysADChk[5] 384. SysADChk[4] 
385. SysADChk[3] 386. SysADChk[2] 387. SysADChk[1] 388. SysADChk[0] 389. SysAD[31 390. SysAD[30] 
391. SysAD[29] 392. SysAD[28] 393. SysAD[27] 394. SysAD[26] 395. SysAD[25 396. SysAD[24] 
397. SysAD[23] 398. SysAD[22] 399. SysAD[21] 400. SysAD[20] 401. SysAD[19 402. SysAD[18] 
403. SysAD[17] 404. SysAD[16] 405. SCData[103] 406. SCData[102] 407. SCData[101] 408. SCData[ 100] 
409. SCData[99] 410. SCData[98] 411. SCData[97] 412. SCData[96] 413. SCDataChk[3] | 414. SCBAddr[9] 
415. SCBAddr[10] 416. SCBAddr[11] 417. SCBAddr[12] 418. SCBAddr[13] 419. SCBAddr[14] 420. SCBAddr[15] 
421. SCBAddr[16] 422. SCBAddr[17] 423. SCBAddr[18] 


Il. Electrical Specifications 


This chapter contains the following electrical and signal information about the R10000 
processor: 


¢ DC electrical specification 
¢ AC electrical specification 


¢ signal integrity issues 


217 


Chapter 11 Electrical Specifications 


11.1 DC Electrical Specification 


This section describes the following DC electrical characteristics of the R10000 processor: 


DC Power Supply Levels 


DC power supply levels 

DCOKk and power supply sequencing 
maximum operating conditions 

input signal level sensing 

mode definitions 

Vref[SC,Sys] 

unused inputs 


DC input/output specifications 


The processor core is powered by a +3.3V (+/- 5%) supply. The processor output drivers 
are powered from a separate supply, dependent on the output logic family used in the 
application system: 


For JEDEC-compatible HSTL operation, the nominal value for VecQSC and 
VecQSys are in the 1.5V (4/- 100 millivolt) range. 


For CMOS/TTL compatible systems, VecQSC and VecQSys can be externally 
tied to the same Vcc as the core power supply. 


NOTE: The I/O pins of the R10000 processor may not be driven higher than 4.0V by 
any device in the system until the Vee and VecQ inputs are stable. 
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DCOk and Power Supply Sequencing 


The following guidelines are designed to protect the processor from damage or latch-up: 


¢ With respect to the Vee (3.3V) (supply to the core), VecQ[SC,Sys] (either 
1.5V or 3.3V) must not be driven more than a diode threshold voltage. 


¢ Vref should not go higher than VecQ[SC,Sys]. Generally, Vref is derived 
from VecQ through a resistor divider, and therefore cannot rise above VecQ. 


¢ The power to termination resistors must not arrive before Vee and 
VecQ[SC,Sys] arrive at the processor. 


e None of the supplies can float or be driven negative. 


One method of protecting the processor from excessive input voltage is to sequence the 
power supplies for the entire system, ensuring that the power to the processor is stable 
before any components drive signals to the processor. Another method to tristate all external 
drivers to the processor with the DCOk pin, until the processor has stabilized. 


NOTE: The input voltage required for the DCOKk is 3.3V in either the CMOS/TTL or 
the HSTL configuration. Both DCOk pins must be tied together externally. 


Maximum Operating Conditions 


Table 11-1 shows the maximum conditions under which the processor operates. 


Table 11-1 Maximum Operating Conditions 


Errata 


Revised “Case Temperature” in Table 11-1, above. 


Parameter Symbol Value 
Core Supply Voltage Vcc 3.6 volts 
QuipEES Uppy Volaze i estab i 
Case Temperature Tc 20° to85 C 
Applied Input Voltage: Vin -0.5 to Vec+0.5 volts 
Maximum Power PR10000 30 watts 
PClk Frequency f 200 MHz 
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Input Signal Level Sensing 


Mode Definitions 


Vref[SC,Sys] 
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The processor input signals are all received by CMOS receivers that are compatible with 
either HSTL or CMOS/TTL logic levels. The I/O levels are defined by VrefSC and 
VrefSys, according to the appropriate logic family (HSTL or CMOS/TTL). 


The mode bit, ODrainSys, is provided to select the characteristics of the pad ring. 


When asserted, this mode bit tristates the PMOS pullup devices to select system interface 
output drivers. This mode is included to allow for multiprocessor systems to use a GTL- 
like open drain configuration with external load/termination resistors providing logic high 
levels. 


The Vref[SC,Sys] pins must be connected to a stable reference voltage source. This 
reference point is used in the input sense amp current mirror to provide the switch point for 
the logic levels. 


Inside the processor, the Vref[SC,Sys] signals have a large capacitance, and a low-pass 
filter at each receiver. The DCOk pins must not be asserted until there has been sufficient 
time for Vref[SC,Sys] to stabilize at each of the receivers inside the processor. 


A typical Vref[SC,Sys] generator is two resistors which provide the Vref[SC,Sys] level 
associated with the chosen logic family, and a 10UF tantalum capacitor connected to the 
processor’s Vref[SC,Sys] pin to provide stability. 


DC Input/Output Specifications 
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All processor output drivers are CMOS push-pull, and the signals swing between VecQ and 
Vss. In open drain mode, the gates of the PMOS pullup devices are disabled. Input-only 
pins include a disabled output buffer for implicit ESD protection. 


Tables 11-2 and 11-3 describe the DC characteristics of the I/O signals for the HSTL and 


CMOS/TLL configurations. 


NOTE: As the JEDEC Standard 8-x evolves, the HSTL specifications will also 
change, and the processor will remain compliant with these standards. 


Table 11-2. DC Characteristics for HSTL Configuration 


Symbol Parameter Minimum Maximum Units Conditions 
VOH Output high voltage VecQ /2 + 0.3V N/A Vv N/A 
VOL Output low voltage VecQ /2 - 0.3V Vv N/A 
VIH Input high voltage Vref + 100mV Vcc + 300mV Vv N/A 
VIL Input low voltage -300mV Vref - 100mV Vv N/A 
TLeak I/O leakage current -TBD TBD mA N/A 


Table 11-3 DC Characteristics for CMOS/ITL Configuration 
Symbol Parameter Minimum Maximum Units Conditions 
VOH Output high voltage 2.4 N/A Vv Vcc = VecQ = min 
VOL Output low voltage N/A 0.4 Vv Vcc = VecQ = min 
VIH Input high voltage 2.0 N/A Vv N/A 
VIL Input low voltage N/A 0.8 Vv N/A 
TLeak I/O leakage current -TBD TBD mA |N/A 


Errata 


All the JTAG output drivers are push-pull CMOS/TTL compatible, with Vee (core) as the 


supply (independent of VecQ[SC, Sys]). All the JTAG inputs require full CMOS swings, 


as given by the DC specifications in the Table 11-3. 
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11.2 AC Electrical Specification 


This section describes the following AC electrical characteristics of the R10000 processor: 
* maximum operating conditions 
* test specification 
e secondary cache and system interface timing 
¢ enable/output delay, setup, hold time 


e asynchronous inputs 


Maximum Operating Conditions 


The R10000 chip clamps signals that overshoot the DC limits established for input logic 
levels. These limits are published as part of the fabrication process characterization. 


The R10000 chip provides silicon diode clamps on all signal pins. 


Test Specification 


HSTL test conditions are based on the JEDEC Standard conditions. 


Secondary Cache and System Interface Timing 


Timing measurements are referenced from the mid-swing point of the input signal to the 
crossing point of the SysClIk and SysClk* input clocks. All input signals maintain a 1 V/ 
ns edge rate in the 20% to 80% range of the input signal swing. 


222 


Chapter 11 Electrical Specifications 


Enable/Output Delay, Setup, Hold Time 


Table 11-4 lists the delay, setup, and hold times for the HTSL version of the processor. 


Table 11-4 AC Characteristics for HSTL Configuration 


HSTL Minimum Maximum 
Output delay 0.5 ns 1.5 ns 
Setup 1.0 ns 
Hold 1.0 ns 


Table 11-5 lists the delay, setup, and hold times for the CMOS/TTL version of the 
processor. 


Table 11-5 AC Characteristics for CMOS/TTL Configuration 


LVCMOS Minimum Maximum 
Output delay 0.5 ns 2.0 ns 
Setup 1.0 ns 
Hold 1.0 ns 


Asynchronous Inputs 


The SysReset* input can be asserted asynchronously to SysClk, but must be negated 
synchronously with SysClk, adhering to the AC electrical specifications listed above. 
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11.3 Signal Integrity Issues 


Reference Voltage 


Power Supply Regulation 


In this section, the following signal integrity considerations are described for a R10000- 
based system: 


¢ Power Supply Regulation 

¢ Decoupling Capacitance 

e Reference Voltage 

e Maximum Input Voltage Levels 
¢ Output I-V Curves 


¢ Switching and Slew Rate Characteristics 


Most input pins on the processor use a current-mirror sense amp with Vref[SC,Sys] 
supplied to the negative input to provide a single rail input receiver. The following input 
pins are exceptions to this rule: 


¢  SysClk and SysClk* 
* DCOk 


All other inputs require a stable Vref[SC,Sys] supply for proper operation. 


The Vref[SC,Sys] source can be a simple voltage divider; the actual impedance of this 
source is not critical, since the Vref[SC,Sys] signals are sampled through a low-pass filter 
on the processor. 


The system must provide connections to all of the Vee, VecQ[SC,Sys], and Vss pins on the 
processor package. The power supply voltages must be held to 5% tolerance at the 
processor pin connection. 


Maximum Input Voltage Levels 
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Maximum excursion of the input signal due to ringing may reach Vcc+0.5V or Vss-0.5V 
for periods of less than 10% of the total driven waveform period. The R10000 processor 
includes overshoot clamps by silicon diode protection which limit the overshoot to 
approximately 500 mV beyond each supply rail. 
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Decoupling Capacitance 


Errata 


In order to regulate the transient current requirements of a R10000-based system, it is 
suggested that explicit decoupling capacitors be used. The R10000 package allows for the 


following capacitors: 
¢ eight Vee-Vss 
¢ five VecQSC-Vss 
¢ four VecQSys-Vss 


The package also provides six connections for the PLL power supplies and loop capacitors. 


VeccPa (VecPd) is connected to VssPa (VssPd) through three decoupling capacitors, as 
shown in Figures 11-1 and 11-2. The 0.1uF and 1 nF low-inductance capacitors are placed 
in parallel with the 10 WF capacitor, as close to the R10000 package as possible.‘ 


10 ohm VecPa 


Vec 
1nF 


10 LF 0.1 LF 


Vss 10 ohm 
VssPa 
Figure 11-1 Decoupling VccPa and VssPa 
2 ohm VccPd 


Vcc ‘ 
aaa = 


10 UF 0.1 UF 1 nF 
Vss 2 ohm 
VssPd 


Figure 11-2 Decoupling VccPd and VssPd 


+ Decoupling between VccPa and VssPa is far more important than decoupling between VccPd 
and VssPd, if both are not possible. 
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[MEMO] 
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12. Packaging 


The R10000 microprocessor is presently supplied in one standard package configuration: 


* asingle-chip 599 ceramic LGA (Land Grid Array) 


MIPS Licensees are encouraged to develop package solutions with MIPS Semiconductor 
Partners to meet specific requirements. 
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12.1 R10000 Single-Chip Package, 599CLGA 


Mechanical Characteristics 
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The standard single-chip R10000 package is a 599CLGA (ceramic land grid array), as 
shown in Figure 12-1. 


The 599CLGA package minimizes output switching noise by reducing the inductance of 
the power and ground paths leading into the package. Much of the decrease in power/ 
ground inductance is accomplished by shortening the wire bonds running from the die pads 
to the package inner leads. The 599CLGA is designed with its cavity-side down, and the 
die is connected directly to a thermal slug. 


The 599CLGA has lands on a straight 1.27mm (.050inch) grid. It is a cavity-down, multi- 
layer ceramic package with an integral copper-tungsten slug, and is designed for use with 
a socket. Preliminary information suggests that the 599CLGA can withstand a force of 100 
kilograms applied to the CuW slug, without damage, and a PWB assembly should insure 
that this force is not exceeded. Drawings for a reference LGA-PWB assembly are included 
in this chapter. 


Electrical Characteristics 
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The 599CLGA uses multilayer construction, incorporating stripline configuration for 
signals. Multiple planes distribute power and ground throughout the package and provide 
built-in distributed bypass/coupling capacitance between the primary power supplies: Vcc, 
VecQSC, VecQSys, and Vss. 


Pads are present on the package body for attaching chip-capacitors to provide additional 
bypass capacitance between the primary power supplies and the PLL power supply (VccPa 
and VssPa), and to provide an additional PLL loop filter capacitor (PLLRC). Chip- 
capacitors on the R10000 are assembled by the chip manufacturer. 


Detailed electrical package characteristics will be provided by the MIPS Semiconductor 
Partners as they become available. The data in Table 12-1 is provided as an estimate of the 
package parasitics. These estimates include the effects of bondwires, package traces and 
vias, but not the sockets. 


Table 12-1 R10000 599CLGA Electrical Characteristics 


Parameter Description Minimum Typical Maximum 

Leg Effective signal inductance 4.0nH 8.4nH 

Mgig Signal-to-signal mutual inductance 1.3nH 

Csig Signal loading capacitance 3.0pF 5.6pF 

Ga Signal-to-signal mutual capacitance 0.5pF 

Rgig Signal resistance 400m2 1300mQ2 

Zo Characteristic impedance 40Q 

Toa Propagation delay 200ps 


The copper-tungsten slug (provided for thermal performance) is hard-connected to Vss to 
minimize EMI radiation from the package. 
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Thermal Characteristics 


Errata 
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The 599CLGA incorporates a copper-tungsten slug to provide an efficient thermal path 
from the processor to the heatsink. 


The thermal analysis listed in Table 12-2 gives a preliminary indication of heatsink 
requirements for the 599CLGA. 


Table 12-2. R10000 599CLGA Thermal Characteristics - Preliminary 


Parameter Description Value 
T. Maximum case temperature 85 C 
T,? Maximum ambient temperature 40°C 
Pri0000 Maximum power dissipation 30 watts 
Tya Minimum temperature differential 45 1G 
Oca ¢ Required case to ambient thermal resistance 15° C/W 


£ Qc is used as an example to calculate the ambient temperature, T,, needed. 


Revised Table 12-2. 


System designers must take care, especially in desktop applications, to ensure sufficient 
airflow and heat-dissipation surface area to meet the required case-to-ambient thermal 


resistance, @,,. 


The thermal interface between the package and heatsink is very important. Typically, grease 
or compliant material is inserted between the package and heatsink to increase the contact 
area between their surfaces. 


Assembly Drawings and Pinout List 
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The following pages contain a pinout list (Table 12-3), and drawings of an example R10000 
LGA-PWB assembly, including details of the PWB, heatsink, and bolster plate. Actual 
hardware specifications are dependent on the user. 


An assembly drawing of the 599LGA is also shown in Figure 12-2. Note that hardware 
specifications given in this drawing will require modifications to accommodate the actual 
dimensions of the socket, PWB, heatsink, bolster, etc. 
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3.81 + 0.38 
47.50 + 0.25 2.54 + 0.25 


1.27 SLUG 
0.70 MAX LID 


Voc VocQSys Voc 


OOoOo0bo0oo0o0obo000000000000000uou00 
OOO0o0o0000000000000 
ooooooooooooOoo0o000no0 
OoOooobooo0o00o0000000 
OOOOOOOOOOOOOOoOomoOOO0oOoOomwmoOoo000000 


i 5 17]19 21 23 25}27 29 31 33 35 


3.5 7 9 11131 
2 4 6 8 10 12 14 16 18 20 22 24 26/28 30 32 34 
—>| 
(2.50 x 45°) i 1.27 


599X 
0.76SQ. 


0.30 
2g 0.2 


TOP VIEW 


BOTTOM VIEW 


NOTES: 


MIPS TECHNOLOGIES, INC. 
1. Dimensions are in millimeters. PACKAGE OUTLINE 


2. Unless otherwise specified, tolerances are +0.13. 599 CERAMIC LAND GRID ARRAY 
3. LID and SLUG are connected to Vsg. REV 7 


Figure 12-1 R10000 599CLGA Package Outline 
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599CLGA Pinout 
Table 12-3, 599CLGA Pinout 

Signal Location Signal Location Signal Location 
DCOk AF.......... 2 DCOk Biss tak 22 JTCK Winsccateces 33 
JTDI W wccrevedess 35 JTDO Neer d ss 31 JTMS AA... 34 
PLLDis | DAR ee 24 PLLRC Peciiaceck ce 25 PLLSparel Crenkie 21 
PLLSpare2 Aswsicte 21 PLLSpare3 Dygshneshesc 21 PLLSpare4 
SCAAddr<0> SCAAddr<1> SCAAddr<2> 
SCAAddr<3> SCAAddr<4> SCAAddr<5> 
SCAAddr<6> SCAAddr<7> SCAAddr<8> 
SCAAddr<9> SCAAddr<10> Bisse 29 SCAAddr< 11> 
SCAAddr<12> Dineen 30 SCAAddr<13> Cay a 31 SCAAdadr< 14> 
SCAAddr<15> Bisse 32 SCAAddr<16> Di avec 31 SCAAddr< 17> 
SCAAddr<18> Oe 32 SCADCS* Besecicinse 9 SCADOE* 
SCADWay Bin eink 10 SCADWr* Agwidiuss 8 SCBAddr<0> 
SCBAddr<1> AP ee cacveits 12 SCBAddr<2> SCBAddr<3> 
SCBAddr<4> 72N Ree 12 SCBAddr<5> SCBAddr<6> 
SCBAddr<7> AP........... 10 SCBAddr<8> SCBAddr<9> 
SCBAddr<10> AP Ares 30 SCBAddr<11> SCBAddr< 12> 
SCBAddr<13> AMS Seniese3s 30 SCBAddr<14> SCBAddr< 15> 
SCBAddr<16> APS 38 32 SCBAddr<17> SCBAddr< 18> 
SCBDCS* AN: sscees 10 SCBDOE* SCBDWay 
SCBDWr* AP Societe 9 SCCIk<0> SCClk<1> 
SCCIk<2> AA ue 31 SCCIk<3> SCCIk<4> 
SCCIk<5> Be Qe 1 SCClk<0>* SCClIk<1>* 
SCCIk<2>* AB ......... 33 SCCIk<3>* SCCIk<4>* 
SCCIk<5>* Bis seesvtiunae 4 SCData<0> SCData<1> 
SCData<2> Pie evicts 33 SCData<3> SCData<4> 
SCData<5> SCData<6> SCData<7> 
SCData<8> SCData<9> SCData<10> 
SCData<11> SCData< 12> SCData<13> 
SCData<14> SCData< 15> SCData<16> 
SCData<17> SCData< 18> SCData<19> 
SCData<20> SCData<21> SCData<22> 
SCData<23> SCData<24> SCData<25> 
SCData<26> SCData<27> SCData<28> 
SCData<29> SCData<30> SCData<31> 
SCData<32> SCData<33> SCData<34> 
SCData<35> SCData<36> SCData<37> 
SCData<38> SCData<39> SCData<40> 
SCData<41> SCData<42> SCData<43> 
SCData<44> SCData<45> SCData<46> 
SCData<47> SCData<48> SCData<49> 
SCData<50> SCData<51> SCData<52> 
SCData<53> SCData<54> SCData<55> 
SCData<56> SCData<57> SCData<58> 
SCData<59> SCData<60> SCData<61> 
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Table 12-3 (cont.) 


Signal Location Signal Location Signal Location 
SCData<62> SCData<63> SCData<64> 

SCData<65> SCData<66> SCData<67> 

SCData<68> SCData<69> SCData<70> 

SCData<71> SCData<72> SCData<73> 

SCData<74> SCData<75> SCData<76> 

SCData<77> SCData<78> SCData<79> 

SCData<80> SCData<81> SCData<82> 

SCData<83> SCData<84> SCData<85> 

SCData<86> SCData<87> SCData<88> 

SCData<89> SCData<90> SCData<91> Cees 5 
SCData<92> Dinakece 5 SCData<93> SCData<94> Cras 4 
SCData<95> Biiats 3 SCData<96> SCData<97> APB echoes 29 
SCData<98> AM......... 28 SCData<99> SCData<100> Aliases 27 
SCData<101> AR cto 28 SCData<102> SCData<103> APs fesiees 27 
SCData<104> Aljessa 16 SCData<105> SCData<106> Alvadias 15 
SCData<107> AP ee ki 13 SCData<108> SCData<109> AN) css 13 
SCData<110> AM......... 14 SCData<111> SCData<112> AM........ 9 
SCData<113> AR aise. 8 SCData<114> SCData<115> AN... 8 
SCData<116> AM......... 8 SCData<117> SCData<118> AN... Z 
SCData<119> AR ue 6 SCData< 120> SCData<121> AP Ses 6 
SCData<122> AM......... 6 SCData< 123> SCData<124> Aliases 6 
SCData<125> AN sessetes 5 SCData< 126> SCData<127> AP occas 4 
SCDataChk<0> Ditessecsenst 33 SCDataChk<1> AL Lies 32 SCDataChk<2> Catia 29 
SCDataChk<3> AR ue 30 SCDataChk<4> 1 eter 30 SCDataChk<5> AP scescss 3 
SCDataChk<6> Bey eteavretel 4 SCDataChk<7> AN... 4 SCDataChk<8> Dvteeviencett 3 
SCDataChk<9> Baeegese 27 SCTCS* Diesssasss 2 SCTag<0> Reisen 1 
SCTag<1> Rd nceaeversts 4 SCTag<2> Pasa easy 1 SCTag<3> Revel cetvut 5 
SCTag<4> eee 3 SCTag<5> NG Ga tiuss 2 SCTag<6> Pedvnaite 4 
SCTag<7> SCTag<8> SCTag<9> 

SCTag<10> SCTag<11> SCTag<12> 

SCTag<13> SCTag<14> SCTag<15> 

SCTag<16> SCTag<17> SCTag<18> 

SCTag<19> SCTag<20> SCTag<21> 

SCTag<22> SCTag<23> SCTag<24> 

SCTag<25> SCTagChk<0> SCTagChk<1> 

SCTagChk<2> Vovisseesn son 2 SCTagChk<3> eee 2) SCTagChk<4> Visseikened 1 
SCTagChk<5> U sssescteas 3 SCTagChk<6> ich estesees 1 SCTOE* Giseccids 5 
SCTWay Bescecsesesves 3 SCTWr* | eee 2 SCTagLSB Addr Tisai 5 
SelDVCO Bai cAsscases 21 Sparel PF iitesciead 5 Spare3 

SysAD<0> SysAD<1> SysAD<2> 

SysAD<3> SysAD<4> SysAD<5> 

SysAD<6> SysAD<7> SysAD<8> 

SysAD<9> SysAD<10> SysAD<11> 

SysAD<12> SysAD<13> SysAD<14> 

SysAD<15> SysAD<16> SysAD<17> 

SysAD<18> SysAD<19> SysAD<20> 


233 


234 


Chapter 12 Packaging 


Table 12-3 (cont.) 


Signal Location Signal Location Signal Location 
SysAD<21> SysAD<22> SysAD<23> 
SysAD<24> SysAD<25> SysAD<26> 
SysAD<27> SysAD<28> SysAD<29> 
SysAD<30> SysAD<31> SysAD<32> 
SysAD<33> SysAD<34> SysAD<35> 
SysAD<36> SysAD<37> SysAD<38> 
SysAD<39> SysAD<40> SysAD<41> 
SysAD<42> SysAD<43> SysAD<44> 
SysAD<45> SysAD<46> SysAD<47> 
SysAD<48> SysAD<49> SysAD<50> 
SysAD<51> SysAD<52> SysAD<53> 
SysAD<54> SysAD<55> SysAD<56> 
SysAD<57> SysAD<58> SysAD<59> 
SysAD<60> SysAD<61> SysAD<62> 
SysAD<63> SysADChk<0> AN......... 22 SysADChk<1> 
SysADChk<2> AL ocr 21 SysADChk<3> AP sii 21 SysADChk<4> 
SysADChk<5> AR ....... 21 SysADChk<6> Alb wc 20 SysADChk<7> 
SysClk SysClk* SysClkRet* 
SysClkRet SysCmd<0> SysCmd<1> 
SysCmd<2> INA scosestece 1 SysCmd<3> SysCmd<4> 
SysCmd<5> ARs ests 4 SysCmd<6> SysCmd<7> 
SysCmd<8> AB ...... 3 SysCmd<9> SysCmd<10> 
SysCmd<11> AD ......... 1 SysCmdPar SysCorErr* 
SysCyc* Beet 20 SysGblPerf* AG oancsiccs 4 SysGnt* 
SysNMI* AK... 5 SysRdRdy* AT ieee 3 SysRel* 
SysReq* ACG cise 5 SysReset* AD .o...55008 3 SysResp<0> 
SysResp<1> AF eisceeces 5 SysResp<2> AM eesrcess 1 SysResp<3> 
SysResp<4> AG... 2 SysRespPar AE wivssesesd 4 SysRespVal* 
SysState<0> ALeerecds 1 SysState<1> ASbsccscvvees 5 SysState<2> 
SysStatePar AT 03085 4 SysStateVal AK... 1 SysUncErr* 
SysVal* AD......... 2 SysWrRdy* TCA 

TCB vA Breen 4 TriState VecPa 
VecPa Cree, 25 VecPd VrefByp 
VssPa VssPa VssPd 

Vec Vcc Vcc 

Vcc Vcc Vcc 

Vcc Vcc Vcc 

Vcc Vcc Vec 

Vec Vcc Vec 

Vcc Vcc Vcc 

Vec Vcc Vcc 

Vec Vcc Vec 

Vcc Vcc Vcc 

Vcc Vcc Vcc 

Vcc Vcc Vcc 

Vcc Vcc Cissedteys 2 Vcc 
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Table 12-3 (cont.) 


Signal Location Signal Location Signal Location 
Vcc Vcc Ds che ouis 16 Vcc Di s.veteigies 20 
Vcc Vcc Bietieness 3 Vee Becca 31 
Vcc Vcc | cee 5 Vee ———i™s«C‘S CR 1 
Vcc Vcc Vcc 

Vcc Vcc Vcc 

Vcc Vcc Vcc 

Vcc Vcc Vcc 

Vcc VecQSC VecQSC 
VecQSC VecQSC VecQSC 
VecQSC VecQSC VecQSC 
VecQSC VecQSC VecQSC 
VecQSC VecQSC VecQSC 
VecQSC VecQSC VecQSC 
VecQSC VecQSC VecQSC 
VecQSC VecQSC VecQSC 
VecQSC VecQSC VecQSC 
VecQSC VecQSC VecQSC 
VecQSC VecQSC VecQSC 
VecQSC VecQSC VecQSC 
VecQSC VecQSC VecQSC 
VecQSC VecQSC VecQSC 
VecQSC VecQSC VecQSC 
VecQSC VecQSys VecQSys 
VecQSys VecQSys VecQSys 
VecQSys VecQSys VecQSys 
VecQSys VecQSys VecQSys 
VecQSys VecQSys VecQSys 
VecQSys VecQSys VecQSys 
VrefSC VrefSys Vss 

Vss Vss Vss 

Vss Vss Vss 

Vss Vss Vss 

Vss Vss Vss 

Vss Vss Vss 

Vss Vss Vss 

Vss Vss Vss 

Vss Vss Vss 

Vss Vss Vss 

Vss Vss Vss 

Vss Vss Vss 

Vss Vss Vss 

Vss Vss Vss 

Vss Vss Vss 

Vss Vss Vss 

Vss Vss Vss 

Vss Vss Vss 
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Table 12-3 (cont.) 


Signal Location Signal Location Signal Location 
Vss Vss Bites cack: 11 Vss By ssevieeceees 17 
Vss Vss By siesscteas 2 Vss Bistsciides 31 
Vss Vss Bascieeetaee 5 Vss Crnneies 1 
Vss Vss Vss 

Vss Vss Vss 

Vss Vss Vss 

Vss Vss Vss 

Vss Vss Vss 

Vss Vss Vss 

Vss Vss Vss 

Vss Vss Vss Gtheak 4 
Vss A ee 3 Vss Vss Distt eet 2 
Vss | Perec 31 Vss Vss | Eee 5 
Vss Nissen snes 1 Vss Vss Nivetacees 35 
Vss N baceeaees 4 Vss Vss Rossssestseses 33 
Vss Vss Vss Wi hie deas 34 
Vss Vss Vss WW ssccdiets 31 
Vss Vss 
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4x 

SHCS 6 — 32 x 1” 

FLAT WASHER 1/2” O.D. 

SPACER 1/4” O.D. x 3/8” 

COMPRESSION SPRING 
P/N CO0420-063-1000M 


HEATSINK 


HEATSINK INTERFACE PAD 


599LGA 


599LGA SOCKET 


PWB 
.O93THK REF. 


.005 KAPTON 
PSA BOTH SIDES 


BOLSTER PLATE 


NOTES: 
1. Dimensions are in inches. 
2. Unless otherwise specified, tolerances are +.005. MIPS TECHNOLOGIES, INC. 
3. Hardware specifications will require modificaiton to accommodate 599LGA ASSEMBLY 
actual dimensions of socket, PWB, heatsink, bolster, etc. REV 3 


Figure 12-2. 599LGA Assembly Drawing 
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LAND DETAIL 


1.600 4x 
599X 234 


CONTACT REGION 3 $ .109 + ¢ .003 NON-PLATED 
@ .030 + @ .002 NON-PLATED HOLE HOLE 


| 9.003 ©] A 


Q 
& 


EXAMPLE — 1 
(SHOWN) 


1S EE ner ee 
o. THIS REGION 


R.014 REF. 4 
.0165 REF. 


& 


.0165 REF. 


OOCOOOCE 31 
Ooeerre 


Y 


EXAMPLE - 2 


@ .089 + @ .003 
NON-PLATED HOLE 


¢ .013 DRILL REF. 
¢ .009 FINISHED VIA REF. | 9.003 ©] A | az 


R.012 REF. 


050 050 


.025 REF. 


uw 
mm 
o 
lo 
rou 
oS 


NOTES: 

Dimensions are in inches. 

Unless otherwise specified, tolerances are +0.005. 

Plating: 30 micro-inches AU over 50 micro-inches NI 

Solder mask prohibited beneath LGA socket 2.250SQ. except as noted. 

Component keepout regions: 
Top side: 2.250SQ. MIPS TECHNOLOGIES, INC. 
Bottom side: 2.750SQ. except 1.000SQ. center region puis POOTERINE 


Via detail shown for reference only. eee REV 4 


Figure 12-3, 599LGA PWB Footprint 
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2.700 + .010 


2.150 


2.700 + .010 
[5 450] 
2.150 


DO OD 


4x 
(234 + .005/- .002 


| 9.004] @| 


@ 500 C’BORE TO ACHIEVE .115 DIMENSION OR 
EXTRUSION FEATURE TO MATCH SIDE-VIEW 
¢ .525 CLEARANCE TO FINS MIN 


OPTION AIRFLOW, LFPM 
-1 100 
-2 200 
-3 400 


NOTES: 
Dimensions are in inches. 
Unless otherwise specified, tolerances are +.005. MIPS TECHNOLOGIES, INC. 
. Material: Aluminum HEATSINK 
. Thermal resistance: THETA-SA = 0.9 C/W MAX 599LGA 
Fin design vendors option REV 4 


Figure 12-4 599LGA Heatsink 
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2.700 + .010 
2.150 .100 (WITHOUT CAPTON) 


BREAK EDGES 
ALL ROUND 
BOTH SIDES 


1.100 


2.700 + .010 


PEM SELF-CLINCHING STANDOFF 005 KAPTON, PSA BOTH SIDES CUT TO FIT 


P/N SOS-632-26 WITH NO OVERHANG ALLOWED EXCEPT IN 
| ¢.004 © CORNERS. 
MOUNTED TO THIS SURFACE. 
PROTECTIVE SHEET ON EXPOSED 
SURFACE. 


NOTES: = MIPS TECHNOLOGIES, INC. 
1. Dimensions are in inches. BOLSTER PLATE 

2. Unless otherwise specified, tolerances are +.005. 599LGA 

3. Material: stainless steel or cold-rolled steel with rust-proof finish. 


Figure 12-5. 599LGA Bolster Plate 
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13. Coprocessor O 


This chapter describes the Coprocessor 0 operation, concentrating on the CPO register 
definitions and the R10000 processor implementation of CPO instructions. 


The Coprocessor 0 (CPO) registers control the processor state and report its status. These 
registers can be read using MFCO instructions and written using MTCO instructions. CPO 
registers are listed in Table 13-1. 
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Table 13-1 Coprocessor 0 Registers 


Register No. Register Name Description 
0 Index Programmable register to select TLB entry for reading or writing 
1 Random Pseudo-random counter for TLB replacement 
2 EntryLo0 Low half of TLB entry for even VPN (Physical page number) 
3 EntryLol Low half of TLB entry for odd VPN (Physical page number) 
4 Context Pointer to kernel virtual PTE table in 32-bit addressing mode 
5 Page Mask Mask that sets the TLB page size 
6 Wired Number of wired TLB entries (lowest TLB entries not used for random replacement) 
7 Undefined Undefined 
8 BadV Addr Bad virtual address 
9 Count Timer count 
10 EntryHi High half of TLB entry (Virtual page number and ASID) 
11 Compare Timer compare 
12 Status Processor Status Register 
13 Cause Cause of the last exception taken 
14 EPC Exception Program Counter 
15 PRId Processor Revision Identifier 
16 Config Configuration Register (secondary cache size, etc.) 
17 LLAddr Load Linked memory address 
18 WatchLo Memory reference trap address (low bits Adr[39:32]) 
19 WatchHi Memory reference trap address (high bits Adr[31:3]) 
20 XContext Pointer to kernel virtual PTE table in 64-bit addressing mode 
21 FrameMask Mask the physical addresses of entries which are written into the TLB 
22 BrDiag Branch Diagnostic register 
23 Undefined Undefined 
24 Undefined Undefined 
25 PC Performance Counters 
26 ECC Secondary cache ECC and primary cache parity 
27 CacheErr Cache Error and Status register 
28 TagLo Cache Tag register - low bits 
29 TagHi Cache Tag register - high bits 
30 ErrorEPC Error Exception Program Counter 
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Coprocessor 0 instructions are enabled if the processor is in Kernel mode, or if bit 28 (CUO) 
is set in the Status register. Otherwise, executing one of these instructions generates a 


Coprocessor 0 Unusable exception. 
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13.1 Index Register (0) 


The Index register is a 32-bit, read/write register containing six bits to index an entry in the 
TLB. The high-order bit of the register shows the success or failure of aTLB Probe (TLBP) 
instruction. 


The Index register also specifies the TLB entry affected by TLB Read (TLBR) or TLB 
Write Index (TLBWI) instructions. 


Figure 13-1 shows the format of the Index register; Table 13-2 describes the Index register 


fields. 
Index Register 
31 30 6 5 0 
) 0 Index 
1 25 6 
Figure 13-1 Index Register 
Table 13-2 Index Register Field Descriptions 
Field Description 
Pp Probe failure. Set to 1 when the previous TLBProbe (TLBP) 
instruction was unsuccessful. 
Index to the TLB entry affected by the TLBRead and 
Index oats : 
TLBWrite instructions 
0 Reserved. Must be written as zeroes, and returns zeroes when 
read. 
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13.2 Random Register (1) 


The Random register is a read-only register of which six bits index an entry in the TLB. 
This register decrements when any instruction graduates at that particular cycle, and its 
values range between an upper and a lower bound, as follows: 


e The lower bound is set by the number of TLB entries reserved for exclusive 
use by the operating system (the contents of the Wired register). 
e The upper bound is set by the total number of TLB entries minus | 


(64 — 1 maximum). 


The Random register specifies the entry in the TLB that is affected by the TLB Write 
Random instruction. The register does not need to be read for this purpose; however, the 
register is readable to verify proper operation of the processor. 


To simplify testing, the Random register is set to the value of the upper bound upon system 
reset. This register is also set to the upper bound when the Wired register is written. 


Figure 13-2 shows the format of the Random register; Table 13-3 describes the Random 


register fields. 
Random Register 
31 6 5 0 
0 Random | 
26 6 
Figure 13-2. Random Register 
Table 13-3 Random Register Field Descriptions 
Field Description 
Random TLB Random index 
0 Reserved. Must be written as zeroes, and returns zeroes when read. 
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13.3 EntryLo0 (2), and EntryLol (3) Registers 


The EntryLo register consists of two registers with identical formats: 
e EntryLo0 is used for even virtual pages. 


¢ EntryLo! is used for odd virtual pages. 


The EntryLo0 and EntryLo/ registers are read/write registers. They hold the physical page 
frame number (PFN) of the TLB entry for even and odd pages, respectively, when 
performing TLB read and write operations. Figure 13-3 shows the format of these 
registers. 


EntryLo0 and EntryLo1 Registers 


63 62 61 34 33 6 5 3.2 ~=#«1 0 
fe: S01 | PFN Lc [ojvia| 
28 e 1 14 


Figure 13-3 Fields of the EntryLoO and EntryLo1 Registers 


Table 13-4 Description of EntryLo Registers’ Fields 


Field Description 
UC Uncached attribute 
PFN Page frame number; the upper bits of the physical address. 
C Specifies the TLB page coherency attribute. 


Valid. If this bit is set, it indicates that the TLB entry is valid; otherwise, 


: a TLBL or TLBS invalid exception occurs. 

V Valid. If this bit is set, it indicates that the TLB entry is valid; otherwise, 
a TLBL or TLBS invalid exception occurs. 

G Global. If this bit is set in both LoO and Lol, then the processor ignores 
the ASID during TLB lookup. 

0) Reserved. Must be written as zeroes, and returns zeroes when read. 
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The PFN fields of the EntryLo0 and EntryLo] registers span bits 33:6 of the 40-bit physical 
address. 


Two additional bits for the mapped space’s uncached attribute can be loaded into bits 63:62 
of the EntryLo register, which are then written into the TLB with a TLB Write. During the 
address cycle of processor double/single/partial-word read and write requests, and during 
the address cycle of processor uncached accelerated block write requests, the processor 
drives the uncached attribute on SysAD[59:58]. The same EntryLo registers are used for 
the 64-bit and 32-bit addressing modes. In both modes the registers are 64 bits wide, 
however when the MIPS III ISA is not enabled (32-bit User and Supervisor modes) only 
the lower 32 bits of the EntryLo registers are accessible. 


MIPS III is disabled when the processor is in 32-bit Supervisor or User mode. Loading of 
the integer registers is limited to bits 3:0, sign-extended through bits 63:32. 
EntryLo[33:31] or PFN[39:38] can only be set to all zeroes or all ones. In 32- and 64-bit 
modes, the UC and PFN bits of both EntryLo registers are written into the TLB. The PFN 
bits can be masked by setting bits in the FrameMask register (described in this chapter) but 
the UC bits cannot be masked or initialized in 32-bit User or Supervisor modes. In 32-bit 
Kernel mode, MIPS III is enabled and 64-bit operations are always available to program the 
UC bits. 


There is only one G bit per TLB entry, and it is written with EntryLoO/[0] and EntryLo1[0] 
on a TLB write. 


13.4 Context (4) 


Errata 
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The Context register is a read/write register containing the pointer to an entry in the page 
table entry (PTE) array; this array is an operating system data structure that stores virtual- 
to-physical address translations. 


When there is a TLB miss, the CPU loads the TLB with the missing translation from the 
PTE array. Normally, the operating system uses the Context register to address the current 
page map which resides in the kernel-mapped segment, kseg3. The Context register 
duplicates some of the information provided in the BadVAdadr register, but the information 
is arranged in a form that is more useful for a software TLB exception handler. 


Figure 13-4 shows the format of the Context register; Table 13-5 describes the Context 
register fields. 


Context Register 


63 23 22 4 3 0 


PTEBase BadVPN2 0 
A 19 4 


Figure 13-4 Context Register Format 


The 0 field in Table 13-5 is revised. 
Table 13-5 Context Register Fields 


Field Description 


This field is written by hardware on a miss. It contains the 
BadVPN2 virtual page number (VPN) of the most recent virtual address 
that did not have a valid translation. 


Reserved. Must be written as zeroes, and returns zeroes when 


0 read. 
This field is a read/write field for use by the operating system. 
PTEBase It is normally written with a value that allows the operating 


system to use the Context register as a pointer into the current 
PTE array in memory. 


The 19-bit BadVPNZ2 field contains bits 31:13 of the virtual address that caused the TLB 
miss; bit 12 is excluded because a single TLB entry maps to an even-odd page pair. For a 
4-Kbyte page size, this format can directly address the pair-table of 8-byte PTEs. For other 
page and PTE sizes, shifting and masking this value produces the appropriate address. 
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13.5 PageMask Register (5) 


248 


The PageMask register is a read/write register used for reading from or writing to the TLB; 
it holds a comparison mask that sets the variable page size for each TLB entry, as shown in 


Table 13-6. Format of the register is shown in Figure 13-5. 


TLB read and write operations use this register as either a source or a destination; when 

virtual addresses are presented for translation into physical address, the corresponding bits 
in the TLB identify which virtual address bits among bits 24:13 are used in the comparison. 
When the Mask field is not one of the values shown in Table 13-6, the operation of the TLB 
is undefined. The 0 field is reserved; it must be written as zeroes, and returns zeroes when 


read. 
PageMask Register 
31 25 24 13.12 0 
0 MASK 0 
7 12 13 
Figure 13-5 PageMask Register 
Table 13-6 Mask Field Values for Page Sizes 
Page Size Bit 
(Mask) 24 |; 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 
4 Kbytes 
16 Kbytes 
64 Kbytes 
256 Kbytes 
1 Mbyte 
4 Mbytes 
16 Mbytes 
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13.6 Wired Register (6) 


The Wired register is a read/write register that specifies the boundary between the wired and 
random entries of the TLB as shown in Figure 13-6. Wired entries are fixed, 
nonreplaceable entries, which cannot be overwritten by a TLB write operation. Random 
entries can be overwritten. 


TLB 


63 


Range of Random entries 


< Wired 
Register 
K 
Y 


Range of Wired entries 


0 This entry is Random, not Wired 


Figure 13-6 Wired Register Boundary 


The Wired register is set to 0 upon system reset. Writing this register also sets the Random 
register to the value of its upper bound (see Random register, above). Figure 13-7 shows 
the format of the Wired register; Table 13-7 describes the register fields. 


Wired Register 


31 6 5 0 
0 Wired | 
26 6 


Figure 13-7 Wired Register 


Table 13-7 Wired Register Field Descriptions 


Field Description 
Wired TLB Wired boundary 
0 Reserved. Must be written as zeroes, and returns zeroes 
when read. 
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13.7 BadVAddr Register (8) 


The Bad Virtual Address register (BadVAddr) is a read-only register that displays the most 
recent virtual address that caused either a TLB or Address Error exception. The BadVAddr 
register remains unchanged during Soft Reset, NMI, or Cache Error exceptions. Otherwise, 
the architecture leaves this register undefined. 


Figure 13-8 shows the format of the BadVAdadr register. 


BadVAddr Register 


63 0 
Bad Virtual Address | 
64 


Figure 13-8 BadVAddr Register Format 


13.8 Count and Compare Registers (9 and 11) 


Count (9) 


The Count and Compare registers are 32-bit read/write registers whose formats are shown 
in Figure 13-9. 


The Count register acts as a real-time timer. Like the R4400 implementation, the R10000 
Count register is incremented every other PCIk cycle. However, unlike the R4400, the 
R10000 processor has no Timer Interrupt Enable boot-mode bit, so the only way to disable 
the timer interrupt is to negate the interrupt mask bit, JM/7/, in the Status register. This 
means the timer interrupt cannot be disabled without also disabling the Performance 
Counter interrupt, since they share JM[7]. 


The Compare register can be programmed to generate an interrupt at a particular time, and 
is continually compared to the Count register. Whenever their values equal, the interrupt 
bit JP/{7] in the Cause register is set. This interrupt bit is reset whenever the Compare 
register is written. 


32-bit Counter (incremented every processor cycle) 


Compare (11) 32-bit Compare Value 
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32-bit Equal-to Comparator i 


Set IP7 in Cause Register 


Figure 13-9 Count and Compare Registers 
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13.9 EntryHi Register (10) 


The EntryHi register holds the high-order bits of a TLB entry for TLB read and write 
operations. 


The EntryHi register is accessed by the TLB Probe, TLB Write Random, TLB Write 
Indexed, and TLB Read Indexed instructions. 


Figure 13-10 shows the format of this register and Table 13-8 describes the register’s fields.. 


EntryHi Register 
63 62 61 44 43 13.12 8 7 0 
R FILL VPN2 0 ASID 
2 18 31 5 8 


Figure 13-10 EntryHi Register 


Table 13-8 EntryHi Register Fields 


Field Description 


Virtual page number divided by two (maps to two pages); upper bits of 


MENZ the virtual address 


Address space ID field. An 8-bit field that lets multiple processes share 
ASID the TLB; each process has a distinct mapping of otherwise identical 
virtual page numbers. 


Region. (00 — user, 01 — supervisor, 11 — kernel) used to match 


vAddr¢3._ 62 
Fill Reserved. 0 on read; ignored on write. 
0) Reserved. Must be written as zeroes, and returns zeroes when read. 


In 64-bit addressing mode, the VPN2 field contains bits 43:13 of the 44-bit virtual address. 


In 32-bit addressing mode only the lower 32 bits of the EntryHi register are used, so the 
format remains the same as in the R4400 processor’s 32-bit addressing mode. The FILL 
field is ignored on write and read as zeroes, as it was in the R4400 implementation. 


When either a TLB refill, TLB invalid, or TLB modified exception occurs, the EntryHi 
register is loaded with the virtual page number (VPN2) and the ASID of the virtual address 
that did not have a matching TLB entry. 
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The Status register (SR) is a read/write register that contains the operating mode, interrupt 
enabling, and the diagnostic states of the processor. The following list describes the more 
important Status register fields; Figure 13-11 shows the format of the entire register, and 

Table 13-10 describes the Status register fields. 


Some of the important fields include: 


¢ The 4-bit Coprocessor Usability (CU) field controls the usability of 4 possible 
coprocessors. Regardless of the CUO bit setting, CPO is always usable in 
Kernel mode. The XX bit enables the MIPS IV ISA in User mode. 


¢ By default, the R10000 processor implements the same user instruction set as 
the R4400 processor. To enable execution of the MIPS I'V instructions in User 
mode, the MIPS IV User Mode bit, (XX) of the CPO Status register must be 


set. 


The MIPS IV instruction extension uses COP1X as the opcode; this designation was 
COP3 in the R4400 processor. For this reason the CU3 bit is omitted in the R10000 
processor, and is used as the XX bit. In Kernel and Supervisor modes, the state of the 
XX bit is ignored, and MIPS IV instructions are always available. 


Mode bit settings are shown in Table 13-9; dashes in the table represent don’t cares. 


Table 13-9 ISA and Status Register Settings for User, Supervisor and 
Kernel Mode Operations 


Mode Ux SX KX XX | MIPSIIT| MIPS TIT | MIPS IV 

0 - - 0 Yes No No 

0) - - 1 Yes No Yes 
User 

1 - - 0 Yes Yes No 

1 - - 1 Yes Yes Yes 

- 0) - - Yes No Yes 
Supervisor 

- - - Yes Yes Yes 
Kernel - - - - Yes Yes Yes 


NOTE: Operation with the MIPS IV ISA does not assume or require that the MIPS 
II] instruction set or 64-bit addressing be enabled — KX, SX and UX may all be set to 


Zero. 
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e The Reduced Power (RP) bit is reserved and should be zero. The R10000 
processor does not define a reduced power mode. 


e The Reverse-Endian (RE) bit, bit 25, reverses the endianness of the machine. 
The processor can be configured as either little-endian or big-endian at system 
reset; reverse-endian selection is available in Kernel and Supervisor modes, 
and in the User mode when the RE bit is 0. Setting the RE bit to | inverts the 
User mode endianness. 

¢ The 9-bit Diagnostic Status (DS) field is used for self-testing, and checks the 
cache and virtual memory system. This field is described in Table 13-11 and 
Figure 13-12. 

¢ The 8-bit Interrupt Mask (IM) field controls the enabling of eight interrupt 
conditions. Interrupts must be enabled before they can be asserted, and the 
corresponding bits are set in both the Interrupt Mask field of the Status 
register and the Interrupt Pending field of the Cause register. 

¢ The processor mode is undefined if the KSU field is set to 3 (115). The 
R10000 processor implements this as User mode. 


Status Register 


31. 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 14 «0 
a ee Ko) > oo ji 
5S /D/5 i a | 
XX a1o/5 RP|FR|RE| 0 | 0 wW TS|SRINMI|CH/CE|DE IM (8 bits) KX|SX|UX} KSU f ia IE 
Cs x wi P) 
Coprocessor 
Usable Diagnostic Status Fields 


Figure 13-11 Status Register 
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Table 13-10 describes the Status register fields. 


Table 13-10 Status Register Fields 


Field 


Description 


XX 


Enables execution of MIPS IV instructions in User mode. 
1 > MIPS IV instructions usable 
0 > MIPS IV instructions unusable 


CU 


Controls the usability of each of the four coprocessor unit 
numbers. CPO is always usable when in Kernel mode, regardless 
of the setting of the CU, bit. 

1 — usable 

0 — unusable 


RP 


FR 


In the R4400 processor, this bit enables reduced-power operation 
by reducing the internal clock frequency. In the R10000 
processor, this bit should be set to zero. 


Enables additional floating-point registers 
0 — 16 registers 
1 — 32 registers 


RE 


Reverse-Endian bit, valid in User mode. 


DS 


IM 


Diagnostic Status field (see Figure 13-12). 


Interrupt Mask: controls the enabling of each of the external, 
internal, and software interrupts. An interrupt is taken if interrupts 
are enabled, and the corresponding bits are set in both the Interrupt 
Mask field of the Status register and the Interrupt Pending field of 
the Cause register. 

0 — disabled 

1 > enabled 


Enables 64-bit addressing in Kernel mode. The extended- 
addressing TLB refill exception is used for TLB misses on kernel 
addresses. 

0 — 32-bit 

1 > 64-bit 


SX 


UX 


Enables 64-bit addressing and operations in Supervisor mode. The 
extended-addressing TLB refill exception is used for TLB misses 
on supervisor addresses. 

0 — 32-bit 

1 > 64-bit 


Enables 64-bit addressing and operations in User mode. The 
extended-addressing TLB refill exception is used for TLB misses 
on user addresses. 

0 — 32-bit 

1 > 64-bit 


Diagnostic Status Field 


Chapter 13 Coprocessor 0 


Table 13-10 (cont.) Status Register Fields 


Field Description 
Mode bits 
11, — Undefined (implemented as User mode) 
KSU 10, — User 
01, — Supervisor 
00, — Kernel 
Error Level; set by the processor when Reset, Soft Reset, NMI, or 
ERL Cache Error exception are taken. 
0 — normal 
1 > error 
Exception Level; set by the processor when any exception other 
EXL than Reset, Soft Reset, NMI, or Cache Error exception are taken. 
0 — normal 
1 — exception 
Interrupt Enable 
IE 0 — disable all interrupts 
1 > enables all interrupts 


The 9-bit Diagnostic Status (DS) field is used for self-testing, and checks the cache and 
virtual memory system. This field is described in Table 13-11 and shown Figure 13-12. 


Some of the important DS fields include: 


¢ In the R4400, the TS bit of the diagnostic field indicates a TLB shutdown has 


occurred due to matching of multiple virtual page entries during address 


translation. In the R10000 processor, the TS bit indicates a TLB write has 

introduced an entry that would allow matching of more than one virtual page 
entry during translation. In this case, the TLB entries that allow the multiple 
matches, even in the Wired area, are invalidated before the new TLB entry is 
written. This prevents multiple matches during address translation. 


The 7S bit is updated for each TLB write. It can also be read and written by software 
(in the R4400, the TS bit is read-only); to clear the TS bit one needs to write a 0 into it. 
As in the R4400, Reset/Soft Reset/NMI exceptions also clear the TS bit. 


¢ The NMI bit is new to the R10000 processor; it distinguishes between Soft 


Reset and NMI exceptions. Both exceptions set the SR bit to 1; the NMI 


exception sets the NMI bit to 1, whereas the Soft Reset exception sets it to 0. 
¢ The CE bit is reserved in the R10000 processor and should be a 0. 
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24 23 22 21 20 19 18 17 16 
0 BEV TS SR NMI CH CE DE 
2 1 1 1 1 1 1 1 
Figure 13-12 Diagnostic Status Field 
Table 13-11 Status Register Diagnostic Status Bits 
Bit Description 
Controls the location of TLB refill and general exception vectors. 
BEV 0 > normal 
1 > bootstrap 
This bit is set when a TLB write presents an entry that matches any 
other virtual page entry in the TLB. Should this occur, any TLB 
entries that allow multiple matches, even in the Wired area, are 
TS invalidated before this new entry can be written into the TLB. This 
prevents multiple matches during address translation. 
0 — normal 
1 > TLB shutdown has occurred. 
SR 1 — Indicates a Soft Reset or NMI exception. 
1 — Indicates a nonmaskable interrupt has occurred. Used to 
NMI _| distinguish between a Soft Reset and a nonmaskable interrupt in a 
Soft Reset exception. 
Hit (tag match and valid state) or miss indication for last CACHE 
CH Hit Invalidate, Hit Write Back Invalidate for a secondary cache. 
0 > miss 
1 > hit 
CE Reserved in the R10000, and should be set to 0. 
Specifies that cache parity or ECC errors cannot cause exceptions. 
DE 0 — parity/ECC remain enabled 
1 > disables parity/ECC 
0) Reserved. Must be written as zeroes, and returns zeroes when read. 


Coprocessor Accessibility 
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Three Status register CU bits control coprocessor accessibility: CUO, CU1, and CU2 enable 
coprocessors 0, 1, and 2, respectively. If a coprocessor is unusable, any instruction that 
accesses it generates an exception. 


The following describes the coprocessor implementations and operations on the R10000: 


Coprocessor 0 is always enabled in kernel mode, regardless of the CUO bit. 


Coprocessor | is the floating-point coprocessor. If CU/ is 0 (disabled), all 
floating-point instructions generate a Coprocessor Unusable exception. In 
MIPS IV, the COP3 instruction is replaced with a second floating-point 
instruction, COP1X. In addition, new functions are added to COP1 (see 
Chapter 14, FPU Instructions). The floating-point branch conditional and 
compare instructions are expanded to use the eight Floating-Point Status 
register condition bits, instead of the original single bit. If any of these extra 
bits are referenced (cc > 0) when not using the MIPS IV ISA, an 
Unimplemented Instruction exception is taken. The integer conditional move 
(MOVC) instruction tests a floating-point condition bit; it causes a 
coprocessor unusable exception if coprocessor | is disabled. 


Coprocessor 2 is defined, but does not exist in the R10000; its instructions 
(COP2, LWC2, LDC2, SWC2, SDC2) always cause an exception, but the 
exception code depends upon whether the coprocessor, as indicated by CU2, 
is enabled. 


Coprocessor 3 has been removed from the MIPS HI ISA, and is no longer 
defined. If MIPS IV is disabled, the coprocessor 3 instruction (COP3) always 
causes a Reserved Instruction exception. 
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13.11 Cause Register (13) 


The 32-bit read/write Cause register describes the cause of the most recent exception. 


Figure 13-13 shows the fields of this register; Table 13-12 describes the Cause register 
fields. A 5-bit exception code (ExcCode) indicates one of the causes, as listed in Table 13- 
13. 


All bits in the Cause register, with the exception of the /P/[1:0] bits, are read-only; [P[1:0] 
are used for software interrupts. 


Table 13-12 Cause Register Fields 


Field Description 


Indicates whether the last exception taken occurred in a branch delay slot. 
BD 1 > delay slot 
0 —> normal 


Coprocessor unit number referenced when a Coprocessor Unusable 


ce exception is taken. This bit is undefined for any other exception. 
Indicates an interrupt is pending. This bit remains unchanged for NMI, 
IP Soft Reset, and Cache Error exceptions. 


1 > interrupt pending 
0 — no interrupt 


ExcCode Exception code field (see Table 13-13) 


0) Reserved. Must be written as zeroes, and returns zeroes when read. 


Cause Register 


31 30 29 28 27 16 15 8 7 6 21 0 
BD| 0] CE 0 IP7 IPO jo) EX. | o 
i a2 12 8 i=. 6). 2 


Figure 13-13 Cause Register Format 
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Table 13-13 Cause Register ExcCode Field 


Exception : Sis 
Code Value Mnemonic Description 
0 Int Interrupt 
1 Mod TLB modification exception 
2 TLBL TLB exception (load or instruction fetch) 
3 TLBS TLB exception (store) 
4 AdEL Address error exception (load or instruction fetch) 
5 AdES Address error exception (store) 
6 IBE Bus error exception (instruction fetch) 
7 DBE Bus error exception (data reference: load or store) 
8 Sys Syscall exception 
9 Bp Breakpoint exception 
10 RI Reserved instruction exception 
11 CpU Coprocessor Unusable exception 
12 Ov Arithmetic Overflow exception 
13 Tr Trap exception 
14 - Reserved 
15 FPE Floating-Point exception 
16-22 - Reserved 
23 WATCH Reference to WatchHi/WatchLo address 
24-30 - Reserved 
31 - Reserved 
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13.12 Exception Program Counter (14) 
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The Exception Program Counter (EPC)' is a read/write register that contains the address at 
which processing resumes after an exception has been serviced. 


For synchronous exceptions, the EPC register contains either: 


e — the virtual address of the instruction that was the direct cause of the exception, 
or 


e the virtual address of the immediately preceding branch or jump instruction 
(when the instruction is in a branch delay slot, and the Branch Delay bit in the 
Cause register is set). 


The processor does not write to the EPC register when the EXL bit in the Status register is 
set toa l. 


Figure 13-14 shows the format of the EPC register. 


EPC Register 
63 0 


EPC 


64 


Figure 13-14 EPC Register Format 


+ The ErrorEPC register provides a similar capability, described later in this chapter. 
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13.13 Processor Revision Identifier (PRId) Register (15) 


The 32-bit, read-only Processor Revision Identifier (PRId) register contains information 
identifying the implementation and revision level of the CPU and CPO. Figure 13-15 shows 
the format of the PR/d register; Table 13-14 describes the PR/d register fields. 


PRid Register 


31 1615 87 0 


0 Imp (0x09) | Rev 


16 8 8 


Figure 13-15. Processor Revision Identifier Register Format 


Table 13-14 PRId Register Fields 


Field Description 
Imp Implementation number 
Rev Revision number 


Reserved. Must be written as zeroes, and returns zeroes when 
read. 


The low-order byte (bits 7:0) of the PR/d register is interpreted as a revision number, and 
the high-order byte (bits 15:8) is interpreted as an implementation number. The 
implementation number of the R10000 processor is 0x09. The content of the high-order 
halfword (bits 31:16) of the register are reserved. 


The revision number is stored as a value in the form y.x, where y is a major revision number 
in bits 7:4 and x is a minor revision number in bits 3:0. 


The revision number can distinguish some chip revisions, however there is no guarantee 
that changes to the chip will necessarily be reflected in the PR/d register, or that changes to 
the revision number necessarily reflect real chip changes. For this reason, software should 
not rely on the revision number in the PR/d register to characterize the chip. 
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13.14 Config Register (16) 


The R10000 processor’s Config register has a different format from that of the R4400, since 
the R10000 processor has different mode bits and configurations, however some fields are 
still compatible: KO, DC, IC, and BE. The value of bits 24:0 are taken directly from the 
Mode bit settings during a reset sequence; refer to Table 8-1 for these bit definitions. Table 
13-15 shows the R10000 Config register fields, along with values which are hardwired into 
the register at boot time; Figure 13-16 shows the Config register format. 


* Table 13-15 Config Register Field Definitions 


Name Hardwired 
R10000 R12000 Values 

Coherency algorithm 

000, — reserved 

001, — reserved 

010, — uncached 
KO 2:0 011, — cacheable noncoherent 
100, — cacheable coherent exclusive 
101, — cacheable coherent exclusive on write 
110, — reserved 
111, — uncached accelerated 
DN 4:3 Device number 
CT 5 CohPrcReqTar 
PE 6 PrcElmReq 
PM 8:7 PrcReqMax 
EC 12:9 SysClkDiv 


Field | Bits 


SB 13 SCBlkSize 
SK 14 SCCorEn 
BE 15 MemEnd 


SS 18:16 | SCSize 
SC 21:19 | SCCIkDiv 


Reserved 
CI 
2:22 227 | SC Data and Tag Corrector disable 
PDR | 23" | Processor coherency data response 
DC 28:26 Primary data cache size (hardwired to 0115) 32 Kbytes 
IC 31:29 Primary instruction cache size (hardwired to 011) 32 Kbytes 


* + Bit 22 of the Config register is ‘SC Data and Tag Corrector disable’. This bit turns off use of ECC to correct errors in the SC data and tags. 


* ++ When Bit[23] of the Config register is set, the response that R12000 produces to an external intervention (shared or exclusive) which hits on a 
CleanExclusive line is changed. As before, the state of the line in the cache is changed, and the former state of the line is sent out on SysState[1:0]. Moreover, 
when Bit[23] of Config is set, a processor coherency data response is sent with the state response. In other words, when this bit is set, external interventions 
which hit CleanExclusive or DirtyExclusive lines in the Secondary Cache result in a processor coherency data response. 


Config Register 


31.29 28 26 25 24 23 22 21 1918 16 15 14 13 12 9 8 7 6 5 4 32 0 


IC | DC o |PDR| SD|SC | SS | BE |} SK | SB EC | PM| PE| CT} DN KO 


3 3 1 2 1 3 3 1 1 1 4 2 1 1 2 3 


* Figure 13-16 Config Register Format 
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13.15 Load Linked Address (LLAddr) Register (17) 


Physical addresses for Load Link instructions are no longer written into this register. 
LLAdadr is implemented as a read/write scratch register used for NT compatibility. 


Figure 13-17 shows the format of the LLAddr register. 


LLAddr Register 


31 
R/W (NT) 


32 
Figure 13-17 LLAddr Register Format 
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13.16 WatchLo (18) and WatchHi (19) Registers 
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WatchHi and WatchLo are 32-bit read/write registers which contain a physical address of a 
doubleword location in main memory. If enabled, any attempt to read or write this location 
causes a Watch exception. This feature is used for debugging. 


Bits 7:0 of the WatchHi register contain bits 39:32 of the trap physical address, shown in 
Figure 13-18. The WatchLo register contains physical address bits 31:3. The remaining 
bits of the register are ignored on write and read as zero. 


Table 13-16 describes the WatchLo and WatchHi register fields. 


WatchLo Register 


31 3 2 1 0 
PAddr0 0;/R | W 
29 1 1 1 
WatchHi Register 
31 8 7 0 
a 
24 8 


Figure 13-18 WatchLo and WatchHi Register Formats 


Table 13-16 WatchHi and WatchLo Register Fields 


Field Description 
PAddr1 Bits 39:32 of the physical address 
PAddrO Bits 31:3 of the physical address 
R Trap on load references if set to 1 
W Trap on store references if set to | 
0 Ignored on write and read as zero. 


Chapter 13 Coprocessor 0 


13.17 XContext Register (20) 


The read/write XContext register contains a pointer to an entry in the page table entry (PTE) 
array, an operating system data structure that stores virtual-to-physical address translations. 
When there is a TLB miss, the operating system software loads the TLB with the missing 
translation from the PTE array. The XContext register no longer shares the information 
provided in the BadVAddadr register, as it did in the R4400. 


The XContext register is for use with the XTLB refill handler, which loads TLB entries for 
references to a 64-bit address space, and is included solely for operating system use. The 
operating system sets the PTE base field in the register, as needed. Normally, the operating 
system uses the Context register to address the current page map, which resides in the 
kernel-mapped segment kseg3. 


Figure 13-19 shows the format of the XContext register; Table 13-17 describes the X Context 


register fields. 
XContext Register 
63 37 36 35 34 4 3 0 
PTEBase R | BadVPN2 | 0 
27 2 31 4 


Figure 13-19 XContext Register Format 


The 31-bit BadVPN2 field holds bits 43:13 of the virtual address that caused the TLB miss; 
bit 12 is excluded because a single TLB entry maps to an even-odd page pair. For a 4-Kbyte 
page size, this format may be used directly to address the pair-table of 8-byte PTEs. For 

other page and PTE sizes, shifting and masking this value produces the appropriate address. 


Errata 
The 0 field in Table 13-17 is revised. 
Table 13-17 XContext Register Fields 
Field Description 
The Bad Virtual Page Number/2 field is written by hardware on a miss. It contains the VPN of the most 
BadVPN2 ; ; F 
recent invalidly translated virtual address. 
The Region field contains bits 63:62 of the virtual address. 
00, = user 
R i 
015 = supervisor 
11, = kernel. 
0 Reserved. Must be written as zeroes, and returns zeroes when read. 
PTEBase The Page Table Entry Base read/write field is normally written with a value that allows the operating 


system to use the Context register as a pointer into the current PTE array in memory. 
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13.18 FrameMask Register (21) 


The FrameMask register is new with the R10000 processor. It masks bits of the EntryLoO 
and EntryLo/ registers so that these masked bits are not passed to the TLB while doing a 
TLB write (either TLBWI or TLBWR). 


A zero in the FrameMask register allows its corresponding bit in the EntryLo[/,0] registers 
to pass to the TLB; a one in the FrameMask register masks off its corresponding bit in the 
EntryLo registers and passes a zero to the TLB. Bits 15:0 of the FrameMask register 
control bits 33:18 of the EntryLo registers. 


The remaining bits of this register are ignored on write and read as zeroes. The content of 
this register is set to zero after a processor reset or a power-up event. 


Figure 13-20 shows the FrameMask register format. 


FrameMask Register 


31 16 15 0 
0 Mask bits, PA[39:24] 
T6 16 


Figure 13-20 FrameMask Register Format 
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13.19 Diagnostic Register (22) 


Errata 


CPO register 22, the Diagnostic register, is a new 64-bit register for processor-specific 
diagnostic functions. (Since this register is designed for local use, the diagnostic functions 
are subject to change without notice.) Currently, this register helps test the ITLB, branch 
caches, and the branch prediction scheme. In addition, it provides choices for branch 
prediction algorithms, to help diagnostic program writing. 


The twelve fields of the Diagnostic register, shown in Figure 13-21, are described below. 
All fields are read-only (all writes are ignored). 


ITLBM: this field is a 4-bit read-only counter. This field is incremented by one for each 
ITLB miss, and any overflow is ignored. Its value is undefined during reset, and its value 
is meaningless when used in an unmapped space. 


BSIdx: this field defines the entry in the branch stack to be used for the latest conditional 
branch decoded. Its value is meaningless if the latest branch was an unconditional branch. 


DBRC-: this field disables the use of the branch return cache (BRC). 


BRCYV-: this field indicates whether or not the branch return cache (BRC) is valid. BRC has 
only one entry (four instructions). 


BRCW:-: this field indicates whether or not the latest branch (JAL, JALR RX, BGEZAL, 
BGEZALL, BLTZAL, or BLTZALL) caused a write into BRC. It is not affected by any 
other type of branch. 


BRCH: this field indicates whether or not the latest branch (JR r31 or JALR rx,r31) has a 
BRC hit. It not affected by any other type of branch. 


MP: this field indicates whether or not the latest conditional branch verified was 
mispredicted. 


BPMode: this is a read-write field for branch prediction algorithm control. 


00>: 2-bit counter scheme 
O14: all conditional branches are predicted not taken 
105: all conditional branches are predicted taken 


11,: forward conditional branches are predicted not taken and backward conditional 
branches are predicted taken. 


The default mode is 00 on processor reset. 
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BPState: this field contains the new 2-bit state for a conditional branch after it is verified. 
It is also used to hold the 2-bit state to read/write when a branch prediction table read/write 
operation is executed. 


BPIdx: this field contains the index to the Branch Prediction Table (BPT) for BPT read/ 
write/initialization operations, and should contain VA///:3] of the branch for BPT read/ 
write operations. The upper six bits of the BP/dx field contain the line address for BPT line 
initialization operations; the lower three bits of BPJdx are ignored. 


BPOp: this field indicates the following BPT operations: 
005: BPT read 


015: BPT write 
10,: initializes BPT line to all zeroes (strongly not taken) 


11,: initializes BPT line to all ones (strongly taken). 


Errata 


Q: Reserved. Must be written as zeroes, and returns zeroes when read. 


Figure 13-21 shows the format of the Diagnostic register. 


63 52 51 48 47 32 
0 ITLBM 0 
12 4 16 
3128 27 23.22 ~=21 20 19 +18 +17 16 15 14 131211 3.21 0 
BRC BP 
BS DBRC| MP 
Idx VIWTH Mode State 0 Idx 0 |Op 
4 5 1 1 1 1 1 2 2 2 9 1 2 


Figure 13-21 Diagnostic Register Format 
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There are two ways to read the branch prediction state from the Branch Prediction Table 


BPT): 


label: 


Place an mfcO rx, CO_ Diag (a Move From Diagnostic register to GPR rx) in 
the delay slot of the conditional branch. This read of the Diagnostic register 
returns the next predicted state from the branch stacks before the BPT is 


updated. 


Move the Jndex and the BPT read operation into the Jdx and BPOp field of the 
Diagnostic register. This mtcO into CPO Diag graduates as soon as the write is 
completed; however, there could be a significant delay in transferring the data 
from BPT to CPO Diag. This delay occurs because CO Diag has a lower 
priority to access the BPT as compared to the accesses by IFETCH and other 
processes. Thus, the prediction state read from the CO Diag may not reflect 
the content of the BPT. Use the code sequence shown below to get the correct 
prediction state from the BPT: 


li 1x # rx has index and BPT read for 

# Idx and BPOp, respectively. 
mtcO rx, C0 Diag # Set the Diagnostic register for reading the BPT 
la ry, label # ry !=r31; la could be replaced by a dla for 64-bits 
jr ry # This gives priority for CO Diag to access BPT 
mfcO rz, CO_Diag # rz holds the state from BPT entry pointed by Idx 


<12000> 


In R12000 two fields are added to the “Diag Resister” - CPO Register 22. One field is 
“ghistory enable”, bits 26:23. The other is “BTAC disable”, bit 27. 


The definitions are: 


Ghistory enable: 


If bit 26 is set, branch prediction uses all eight bits of the global history register. 
If bit 26 is not set, then bits 25:23 specify a count of the number of bits of global 
history to be used. Thus if bits 26:23 are all zero, global history is disabled. 


The global history contains a record of the taken/not-taken status of recently 
executed branches, and when used is XOR’ed with the PC of a branch being 
predicted to produce a hashed value for indexing the BPT. Some programs with 
small “working set of conditional branches” benefit significantly from the use of 
such hashing, some see slight performance degradation. 


BTAC disable: 


If bit 27 is set, the use of the Branch Target Address Cache (BTAC) is disabled. 
The BTAC is used to reduce the instruction fetch penalty of taken branches by 
providing the target address of fixed-address branch and jump instructions. 
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13.20 Performance Counter Registers (25) 


The R10000 processor defines two performance counters and two associated control 
registers, which are mapped into CPO register 25. An encoding in the MTCO/MFCO 
instructions on register 25 indicates which counter or control register is used. 


Each counter is a 32-bit read/write register and is incremented by one each time the 
countable event, specified in its associated control register, occurs. Each counter can 
independently count one type of event at a time. 


The counter asserts an interrupt, JP/7], when its most significant bit (bit 31) becomes one 
(the counter overflows) and the associated performance control register enables the 
interrupt. 


The counting continues after counter overflow whether or not an interrupt is signalled. 


The format of the control registers are shown in Figure 13-22. 


31 9 8 5 4 3.2 04 0 
0 Event IE |} U} S | K } EXL 
23 4 1 1 1 1 1 


Figure 13-22 Control Register Format 
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The fields of the Control register are: 


¢ The Event field specifies the event to be counted, listed in Table 13-18. 


Table 13-18 Counter Events 


Event Counter 0 Counter 1 
0 Cycles Cycles 
1 Instructions issued Instructions graduated 
2 Load/prefetch/sync/CacheOp issued Load/prefetch/sync/CacheOp graduated 
3 Stores (including store-conditional) issued Stores (including store-conditional) graduated 
4 Store conditional issued Store conditional graduated 
5 Failed store conditional Floating-point instructions graduated 
6 Branches resolved Quadwords written back from primary data cache 
7 Quadwords written back from secondary cache TLB refill exceptions 
8 Correctable ECC errors on secondary cache data Branches mispredicted 
F : Secondary cache load/store and cache-ops 
9 Instruction cache misses : 
operations 
10 Secondary cache misses (instruction) Secondary cache misses (data) 
alba Secondary cache way mispredicted (instruction) Secondary cache way mispredicted (data) 
: : External intervention request is determined to have 
12 External intervention requests sibs 
hit in secondary cache 
‘ F External invalidate request is determined to have hit 
13 External invalidate requests é 
in secondary cache 
14 Pisiconaliantesamelenen eves Stores or prefetches with store hint to 
P ¥ CleanExclusive secondary cache blocks 
; Stores or prefetches with store hint to Shared 
5 Instructions graduated 
secondary cache blocks 
Errata 


Made various changes to Table 13-18, as indicated by the underlines. Note that the 
updated material reflects the functionality of silicon revision 3.0 and later. The status of 


earlier silicon revisions are documented as silicon errata available on www.mips.com. 


e The JE bit enables the assertion of /P/7] when the associated counter 


overflows. 


e The U, S, K, and EXL bits indicate the processor modes in which the event is 
counted: U is user mode; S is supervisor mode; K is kernel mode when EXL 
and ERL both are set to 0; the system is in kernel mode and handling an 
exception when EXL is set to 1, as shown in Table 13-272. 


e 0: Reserved. Must be written as zeroes, and returns zeroes when read. 


These modes can be set individually; for example, one could set all four bits to count a 
certain event in all processor modes except during a cache error exception. 
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Errata 


In describing the rules that are applied for the counting of each events listed in Table 13- 
18, following terminology is used: 


Done is defined as the point at which the instruction is successfully executed by the 
functional unit but is not yet graduated. 


Graduated is defined as the point in time when the instruction is successfully executed 


(done), and it is the oldest instruction. 


Secondary Cache Transaction Processing (SCTP) logic is on-chip logic in which up 
to four internally-generated and one-externally generated secondary cache 
transactions are queued to be processed. 


The following rules apply for the counting of each event listed in Table 13-16: 


Event 0 for Counter 0 and Counter 1: Cycles 


The counter is incremented on each PCIk cycle. 


272 


Chapter 13 Coprocessor 0 


Event 1 for Counter 0: Instructions Issued 


The counter is incremented on each cycle by the sum of the three following events: 


Integer operations marked as done on the cycle. 0, 1 or 2 such operations can 
be marked on each cycle. Since these operations (all except for MUL and DIV) 
are marked done on the cycle following their being issued to a functional unit, 
this number is nearly identical to the number issued. The only difference is that 
re-issues are not counted. 


Floating point operations marked done in the active list. Possible values are 0, 1 
or 2. Since these operations take more than one cycle to complete, it is possible 
for_an instruction to be issued and then aborted before it is counted, due to a 
branch-misprediction or exception rollback. 


Load/store instructions first issued to the address calculation unit on the 
previous cycle. Possible values are 0 or 1. Prefetch instructions are counted as 
issued. Load/store instructions are counted as being issued only once, even 
though they may have been issued more than one time.* Any instruction which 
does not go to the load/store unit, integer functional unit, or FP functional is 
counted. Some of those not counted are: nops, bel {f.t,fl.tl}, break, syscall, j, 
jal, jr. jalr, cpO instructions. 


Event 1 for Counter 1: Instruction Graduation. 


The counter is incremented by the number of instructions that were graduated on the 


previous cycle. When an integer multiply or divide instruction graduates, it is counted as 


two instructions. 


Event 2 for Counter 0: Load/Prefetch/Sync/CacheOp Issue. 


Each of these instructions are counted as they are issued. A load instruction is only counted 


once, even though it may have been issued more than one time.* 


Event 2 for Counter 1: Load/Prefetch/Sync/CacheOp Graduation. 


Each of these instructions are counted as they are graduated. Up to four loads can graduate 


in one cycle. 


+ This could be a result of DCache Tag being busy or four Instruction or Data cache misses 
already present and waiting to be processed in the Secondary Cache Transaction Processing 
(SCTP) logic. 
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Event 3 for Counter 0: Stores (Including Store-Conditional) Issued. 


The counter is incremented on the cycle after a store instruction is issued to the address- 
calculation unit. Note that a store can only be counted as having been issued once, even 
though it may actually be issued more than once due to DCache Tag being busy or there 
already being four load/store cache misses waiting in the SCTP logic. 


Event 3 For Counter 1: Store (Including Store-Conditional) Graduation. 


Each graduating store (including SC) increments the counter. At most one store can 
graduate per cycle. 


Event 4 for Counter 0: Store-Conditional Issued. 


This counter is incremented on the cycle after a store conditional instruction is issued to the 
address-calculation unit. Note that an SC can only be counted as having been issued once, 
even though it may actually be issued more than once due to DCache Tag being busy or 
there already being four load/store cache misses waiting in the SCTP logic. 


Event 4 for Counter 1: Store-Conditional Graduation. 


At most, one store-conditional can graduate per cycle. This counter is incremented on the 


cycle following the graduation of a store-conditional instruction. 


Event 5 for Counter 0: Failed Store Conditional. 


This counter is incremented when a store-conditional instruction fails. 


Event 5 for Counter 1: Floating-Point Instruction Graduation. 


This counter is incremented by the number of FP instructions which graduated on the 
previous cycle. Any instruction that sets the FP Status register bits (EVZOUD) is counted as 
a graduated floating point instruction. There can be 0 to 4 such instructions each cycle. 


Event 6 for Counter 0: Conditional Branch Resolved 


This counter is incremented when a conditional branch is determined to have been 
“resolved.” Note that when multiple floating-point conditional branches are resolved in a 
single cycle, this counter is still only incremented by one. Although this is a rare event, in 
this case the count would be incorrect. 


+ In other words, this count is the sum of the conditional branches that are known to be both 
correctly predicted and mispredicted. 
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Event 6 for Counter 1: Quadwords Written Back From Primary Data Cache 


This counter is incremented once each cycle that a quadword of data is written from 


primary data cache to secondary cache. 


Event 7 for Counter 0: Quadwords Written Back From Secondary Cache 


This counter is incremented once each cycle that a quadword of data is written back from 
the secondary cache to the outgoing buffer located in the on-chip system-interface unit. 
(Note that data from the outgoing buffer could be invalidated by an external request and not 
sent out of the processor.) 


Event 7 for Counter 1: TLB Refill Exception (Due To TLB Miss) 


This counter is incremented on the cycle after the TLB miss handler is invoked. All TLB 
misses are counted, whether they occur in the native code or within the TLB handler. 


Event 8 for Counter 0: Correctable ECC Errors On Secondary Cache Data. 


This counter is incremented on the cycle after the correction of a single-bit error on a 
quadword read from the secondary cache data array. 


Event 8 for Counter 1: Branch Misprediction. 


This counter is incremented on the cycle after a branch is restored because of misprediction. 
Note that the misprediction is determined on the same cycle that the conditional branch is 
resolved. The misprediction rate is the ratio of branch mispredicted count to conditional 
branch resolve count. 


Event 9 for Counter 0: Primary Instruction Cache Misses. 


This counter is incremented one cycle after an instruction refill request is sent to the SCTP 
logic. 


Event 9 for Counter 1: Secondary Cache Load/Store and Cache-ops Operations 


This counter is incremented one cycle after a request is entered into the SCTP logic, 
provided the request was initially targeted at the primary data cache. Such requests fall into 


three categories: 


* _primary data cache misses 


* _requests to change the state of primary and secondary and primary data cache 
lines from Clean to Dirty, due to stores hitting a clean line in the primary data 
cache 


requests initiated by Cache-op instructions 
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Event 10 for Counter 0: Secondary Cache Misses (Instruction) 


This counter is incremented the cycle after the last quadword of a primary instruction cache 


line is written from the main memory, while the secondary cache refill continues. 


Event 10 for Counter 1: Secondary Cache Misses (Data) 


This counter is incremented the cycle after the second quadword of a data cache line is 
written from the main memory, while the secondary cache refill continues. 


Event 11 for Counter 0: Secondary Cache Way Misprediction (Instruction) 


This counter is incremented when the secondary cache controller begins to retry an access 


to the secondary cache after it hit in the non-predicted way, provided the secondary cache 


access was initiated by the primary instruction cache. 


Event 11 for Counter 1: Secondary Cache Way Misprediction (Data 


This counter is incremented when the secondary cache controller begins to retry an access 


to the secondary cache because it hit in the non-predicted way, provided the secondary 
cache access was initiated by the primary data cache. 


Event 12 for Counter 0: External Intervention Requests 


This counter is incremented on the cycle after an external intervention request enters the 
SCTP logic. 


Event 12 for Counter 1: External Intervention Requests Hits In Secondary Cache 


This counter is incremented on the cycle after an external intervention request is 
determined to have hit in the secondary cache. 


Event 13 for Counter 0: External Invalidate Requests 


This counter is incremented on the cycle after an external invalidate request enters the 
SCTP logic. 


Event 13 for Counter 1: External Invalidate Requests Hits In Secondary Cache 


This counter is incremented on the cycle after an external invalidate request is determined 


to have hit in the secondary cache. 
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Event 14 for Counter 0: Functional Unit Completion Cycles 


This counter is incremented once on the cycle after at least one of the functional units — 
ALU1, ALU2, FPU1, or FPU2 — marks an instruction as done. 


Event 14 for Counter 1: Stores, or Prefetches with Store Hint to Clean Exclusive Secondary Cache 
Blocks. 


This counter is incremented on the cycle after a request to change the Clean Exclusive state 
of the targeted secondary cache line to Dirty Exclusive is sent to the SCTP logic. 


Event 15 for Counter 0: Instruction Graduation. 


This counter is incremented by the number of instructions that were graduated on the 
previous cycle. When an integer multiply or divide instruction graduates, it is counted as 
two graduated instructions. 


Event 15 for Counter 1: Stores or Prefetches with Store Hint to Shared Secondary Cache Blocks. 


This counter is incremented on the cycle after a request to change the Shared state of the 
targeted secondary cache line to Dirty Exclusive is sent to the SCTP logic. 


The performance counters and associated control registers are written by using an MTCO 
instruction, as shown in Table 13-19. 


Table 13-19 Writing Performance Registers Using MTCO 


Opcode[15:11] Opcode[1:0] Operation 
11001 00 Move to Performance Control 0 
11001 01 Move to Performance Counter 0 
11001 10 Move to Performance Control | 
11001 11 Move to Performance Counter | 


The performance counters and associated control registers are read by using a MFCO 
instruction, as shown in Table 13-20. 


Table 13-20 Reading Performance Registers Using MFCO 


Opcode[15:11] Opcode[1:0] Operation 
11001 00 Move from Performance Control 0 
11001 01 Move from Performance Counter 0 
11001 10 Move from Performance Control 1 
11001 11 Move from Performance Counter 1 
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The format of the performance control registers are shown in Table 13-21. 


Table 13-21 Performance Control Register Format 


[8: 5] [4] [3:0] 
: Count enable 
Event select IP[7] interrupt enable (U/S/K/EXL) 


The count enable field specifies whether counting is to be enabled during User, Supervisor, 
Kernel, and/or Exception level mode. Any combination of count enable bits may be 
asserted. 


All unused bits in the performance control registers are reserved. 
All counting is disabled when the ERL bit of the CPO Status register is asserted. 


Table 13-22 defines the operation of the count enable bits of the performance control 
registers. 


Table 13-22 Count Enable Bit Definition 


Count Enable Bit Count Qualifier (CPO Status Register Fields) 
U KSU = 2 (User mode), EXL = 0, ERL = 0 
S KSU = 1 (Supervisor mode), EXL = 0, ERL = 0 
K KSU = 0 (Kernel mode), EXL = 0, ERL = 0 
EXL EXL = 1, ERL=0 


The following rules apply: 


¢ The performance counter registers may be preloaded with an MTCO 
instruction, and counting is enabled by asserting one or more of the count 
enable bits in the performance control registers. 


e The interrupt enable bit must be asserted to cause [P/[7]. 


e To determine the cause of the interrupt, the interrupt handler routine must 
query the following: 


- the performance counter register 


- the interrupt enable bit of the associated performance control register of 
both counters 


e If neither of the counters caused the interrupt, /P/7] must be the result of the 
CPO Count register matching the CPO Compare register. 
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13.21 ECC Register (26) 


The R10000 processor implements a 10-bit read/write ECC register which is used to read 
and write the secondary cache data ECC or the primary cache data parity bits. (Tag ECC 
and parity are loaded to and stored from the TagLo register.) Unlike the R4400, the only 
CacheOps that use ECC register are Index Load Data and Index Store Data. 


In the R4400, both the primary instruction and data caches are parity byte-protected. 


In the R10000 processor, the following protection schemes are used: 


e The primary instruction cache is word-protected (where one word contains 36 
bits), and one parity bit is used for each instruction word (JP in Figure 13-23). 


¢ The primary data cache is byte-protected, with four bits used for each 32-bit 
data word (DP in Figure 13-23). 


¢ Each quadword of the secondary cache data uses nine bits of ECC and one bit 
of parity (SP and ECC in Figure 13-23). 


The primary instruction CacheOps load or store one instruction word at a time; therefore, 
one bit is used in the ECC register. The primary data CacheOps load or store four bytes at 
a time; therefore, four bits are used in the ECC register. The secondary CacheOps use 
ECC{[9] as the parity bit and ECC[8:0] as the 9-bit ECC. For the Index Store Data 
CacheOps, the unused bits are ignored. For Index Load Data CacheOps, the unused a bits 
are with zeroes. 


Figure 13-23 shows the format of the ECC register; Table 13-23 describes the register 


fields. 
31 10 9 8 0 
1 
22 1 9 
Figure 13-23 ECC Register Format 
Table 13-23 ECC Register Fields 
Field Description 
SP A |-bit field specifying the parity bit read from or written to a secondary 
cache. 
ECC An 9-bit field specifying the ECC bits read from or written to a secondary 
cache. 
DP An 4-bit field specifying the parity bits read from or written to a primary 
data cache. 
P An 1-bit field specifying the parity bit read from or written to a primary 
instruction cache. 
0) Reserved. Must be written as zeroes, and returns zeroes when read. 
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13.22 CacheErr Register (27) 


The CacheErr register is a 32-bit read-only register that handles ECC errors in the 
secondary cache or system interface, and parity errors in the primary caches. 


R10000 processor correction policy is as follows: 
e Parity errors cannot be corrected. 


¢ Single-bit ECC errors can be corrected by hardware without taking a Cache 
Error exception. 


¢ Double-bit ECC errors can be detected but not corrected by hardware. 


¢ All uncorrectable errors take Cache Error exceptions unless the DE bit of the 
Status register is set. 


¢ As in the R4400, cache errors are imprecise. 


The CacheErr register provides cache index and status bits which indicate the source and 
nature of the error; it is loaded when a Cache Error exception is taken. 


CacheErr Register Format for Primary Instruction Cache Errors 


Figure 13-24 shows the format of the CacheErr register when a primary instruction cache 
error occurs. 


3130 29° 28 27 26252423 22 21 14 13 65 0 
00 |EW] 0 | D | TA} TS 0 Pldx 0 
2 Ti Ae 26 "2 > 72 8 8 6 


Figure 13-24 CacheErr Register Format for Primary Instruction Cache Errors 


EW: set when CacheErr register is already holding the values of a previous error 
D: data array error (way! way0) 

TA: tag address array error (wayl || way0) 

TS: tag state array error (way1 | way0) 


Pldx: primary cache virtual block index, VA[13:6] 


Errata 


0: Reserved. Must be written as zeroes, and returns zeroes when read. 
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CacheErr Register Format for Primary Data Cache Errors 


Errata 


Figure 13-25 shows the format of the CacheErr register when a primary data cache error 


occurs. 

31 30 29 28 27 26 2524 232221 2019 14 13 3 2 0 
01 |EW/EE] DJ TA | TS|TM 0 Pldx 0 
2 1 14 2 2 2 2 6 11 3 


Figure 13-25 CacheErr Register Format for Primary Data Cache Errors 


EW: set when CacheErr register is already holding the values of a previous error 
EE: tag error on an inconsistent block 

D: data array error (way1 || way0) 

TA: tag address array error (way! || way0) 

TS: tag state array error (way 1 || way0) 

TM: tag mod array error (way1 || way0) 

Pldx: primary cache virtual double word index, VA[13:6] 


0: Reserved. Must be written as zeroes, and returns zeroes when read. 
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CacheErr Register Format for Secondary Cache Errors 


Figure 13-26 shows the format of the CacheErr register when a secondary cache error 


occurs. 
31 30 29 28 2726 25 24 23 22 6 5 0 
10 ;EW) 0 | D 0 Sldx 0 
2 1 1 2 2 1 17 6 


Figure 13-26 CacheErr Register Format for Secondary Cache Errors 


EW: set when CacheErr register is already holding the values of a previous error 
D: uncorrectable data array error (way1 || way0) 
TA: uncorrectable tag array error (way! || way0) 


SIdx: secondary cache physical block index (PA[22:6] for 16-word block size or PA[22:7] 
for 32-word block size) 


Errata 


0: Reserved. Must be written as zeroes, and returns zeroes when read. 
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CacheErr Register Format for System Interface Errors 


Figure 13-27 shows the format of the CacheErr register when a System interface error 


occurs. 
31. 30 29 28 27 26 25 24 23 22 6 5 0 
11 |EW|EE} D |SA|SC/SR Sldx 0 
2 1 1 2 14 #1 ~«1 17 6 


Figure 13-27 CacheErr Register Format for System Interface Errors 


EW: set when CacheErr register is already holding the values of a previous error 
EE: data error on a CleanExclusive or DirtyExclusive 

D: uncorrectable system block data response error (way1 || way0) 

SA: uncorrectable system address bus error 

SC: uncorrectable system command bus error 

SR: uncorrectable system response bus error 


SIdx: secondary cache physical block index 


Errata 


0: Reserved. Must be written as zeroes, and returns zeroes when read. 
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13.23 TagLo (28) and TagHi (29) Registers 


The 7agHi and TagLo registers are 32-bit read/write registers used to hold the following: 
e the primary cache tag and parity 
e the secondary cache tag and ECC 


¢ the data in primary or secondary caches for certain CacheOps 
TagHi/Lo formats in the R10000 processor differ from those in the R4400 due to changes 
in CacheOps and cache architecture. R10000 formats depend on the type of CacheOp 
executed and the cache to which it is applied. The reserved fields are read as zeroes after 
executing an Index Load Tag or an Index Load Data CacheOp and ignored when executing 
an Index Store Tag or an Index Store Data CacheOp. 


To ensure NT kernel compatibility, the ZagLo register is implemented as a 32-bit read/write 
register. The value written by an MTCO instruction can be retrieved by a MFCO instruction, 
unless an intervening CACHE instruction has modified the content. 


This section gives the TagLo and TagHi register formats for the following CacheOp and 
cache combinations: 


¢ CacheOp is Index Load/Store Tag 
- primary instruction cache operation 
- primary data cache operation 
- secondary cache operation 

¢ CacheOp is Index Load/Store Data 
- primary instruction cache operation 
- primary data cache operation 


- secondary cache operation 


CacheOp is Index Load/Store Tag 


This section describes the three states of the TagLo and TagHi registers, when the CacheOp 
is an Index Load/Store Tag for the following operations: 


¢ primary instruction cache operation 
¢ primary data cache operation 


e secondary cache operation 


+ To ensure NT kernel compatibility, the TagLo register is implemented as a 32-bit read/write 
register. The value written by a MTCO instruction can be retrieved by a MFCO instruction, 
unless intervening CACHE instructions modify the content. 


284 


Chapter 13 Coprocessor 0 


Primary Instruction Cache Operation 


If the CacheOp is an Index Load/Store Tag for a primary instruction cache operation, the 
fields of the JagHi and TagLo registers are defined as follows: 


PTag0: contains physical address bits [35:12] stored in the cache tag 


PState: contains the primary instruction cache state for the line, as follows: 
1 = Valid 
0 = Invalid 


Errata 


LRU: indicates which way is the least recently used of the set. 


SP: state even parity bit for the PState field 

TP: tag even parity bit. 

PTag1: contains physical address bits [39:36] stored in the cache tag 
Figure 13-28 shows the fields of the ZagHi and TagLo registers. 


31 8 7 6 5 4 3 2 1 0 
PTagO 0 | PState 0 LRU} SP} 0 | TP J TagLo 
24 1 1 2 1 1 #11 
31 43 Mw 
0 PTag1 | TagHi 
28 4 


Figure 13-28 TagHi/Lo Register Fields in Primary Instruction Cache 
When CacheOp is Index Load/Store Tag 


0: Reserved. Must be written as zeroes, and returns zeroes when read. 


Primary Data Cache Operation 


If the CacheOp is an Index Load/Store Tag for primary data cache operations, the fields of 
the JagHi and TagLo registers are defined as follows: 


State Modifier: holds the status of the line, as follows: 


001, = neither refilled or written 


010, = this line may have been written and inconsistent from the secondary cache (W 
bit) 


100, = this line is being refilled (Refill bit). 
PTag!: contains physical address bits [39:36] stored in the cache tag 
PTag0: contains physical address bits [35:12] stored in the cache tag 
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PState: together with the Refill bit of the State Modifier in the TagHi register, PState 
determines the state of the cache block in the primary data cache, as shown in Table 13-24. 


Table 13-24 PState Field Definition in TagHi/Lo Registers, For Primary Data Cache Operation 
When CacheOp is Index Load/Store Tag 


PState Refill=0 Refill=1 
00, Invalid Refill clean (block is being 
refilled) 
01, Sea Upgrade Share (converting 
shared to dirty) 
10, Clean Upgrade Clean (converting 
Exclusive clean to dirty). 
Dirty Refill dirty (block is being 
11, Exclusive refilled for a store) 


LRU: indicates which way is the least recently used of the set. 


SP: state even parity bit for the PState field and the Way bit 
Way: indicates which secondary cache set contains the primary cache line for this tag 
TP: tag even parity bit. 


0: Reserved. Must be written as zeroes, and returns zeroes when read. 


Figure 13-29 shows the fields of the ZagHi and TagLo registers. 


31 87 6 4 3 2 1 0 
PTagO PState 0 |LRU|SP |Way] TP ff TagLo 
24 2 2 | ee | 
31 29 28 3 0 
State 
Modifier 0 PTag1 TagHi 
3 25 4 


Figure 13-29 TagHi/Lo Register Fields in Primary Data Cache 
When CacheOp is Index Load/Store Tag 
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Secondary Cache Operation 


Errata 


Errata 


If the CacheOp is an Index Load/Store Tag for secondary cache operations, the fields of the 


TagHi and TagLo registers are defined as follows: 
STagO: contains physical address bits [35:18] stored in the cache tag 


SState: contains the secondary cache state of the line, as follows: 


005 = Invalid 
O15 = Shared 
105 = Clean Exclusive 


11, = Dirty Exclusive 


VIndex (virtual index): contains only two bits of significance since the32 Kbyte 2-way set 


associative primary caches are addressed using only two untranslated address bits 


(VA[13:12]) plus the offset within the virtual page. 
ECC: contains the ECC for the STag, SState and VIndex fields. 


MRU: indicates which way was the most recently used in the set. 


STag/: contains the physical address bits [39:36] stored in the cache tag. 


0: Reserved. Must be written as zeroes, and returns zeroes when read. 


Figure 13-30 shows the fields of the TagHi and TagLo registers. 


31 14131211 10 9 8 76 0 
STagO 0 | SState | 0 | VIndex ECC TagLo 
18 2 2 1 2 7 
31 30 43 0 
MRU 0 STag1 TagHi 
{ 27 4 


Figure 13-30 TagHi/Lo Register Fields in Secondary Cache 
When CacheOp is Index Load/Store Tag 


Figure 13-30, size of the STag0 field is revised. 
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CacheOp is Index Load/Store Data 


This section describes the following three states of the JagLo and TagHi registers, when the 
CacheOp is an Index Load/Store Data: 


* primary instruction cache operation 
e primary data cache operation 


* secondary cache operation 


Primary Instruction Cache Operation 


If the CacheOp is an Index Load/Store Data for the primary instruction cache, the TagHi 
register stores the most significant four bits of a 36-bit instruction, as shown in Figure 13- 
31; the rest of the instruction is stored in the TagLo register. 


31 0 


Inst[31:0] TagLo 


32 


31 43 o 


0 Inst[35:32] TagHi 
28 4 


Figure 13-31 TagHi/Lo Register Fields in Primary Instruction Cache 
When CacheOp is Index Load/Store Data 


Errata 


0: Reserved. Must be written as zeroes, and returns zeroes when read. 
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Primary Data Cache Operation 


If the CacheOp is Index Load/Store Data for primary data cache, the TagHi register is not 
used. The 7agLo registers contains a 32-bit data word for the cache operation, as shown in 


Figure 13-32. 


0 


31 
Data Word[31:0] TagLo 
32 


0 


31 
Not Used TagHi 
32 


Figure 13-32 TagHi/Lo Register Fields in Primary Data Cache 
When CacheOp is Index Load/Store Data 


Secondary Cache Operation 
If the CacheOp is Index Load/Store Data for the secondary cache, a doubleword of data is 
required for the CacheOp. The 7agHi register stores the upper 32 bits of the doubleword 
and the TagLo register stores the lower 32 bits, as shown below in Figure 13-33. 


31 
Doubleword[31:0] 


32 


Doubleword[63:32] 


32 


Figure 13-33 TagHi/Lo Register Fields in Secondary Cache 
When CacheOp is Index Load/Store Data 
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13.24 ErrorEPC Register (30) 


290 


The ErrorEPC register is similar to the EPC register, except that ErrorEPC is used on ECC 
and parity error exceptions. It is also used to store the program counter (PC) on Reset, Soft 
Reset, and nonmaskable interrupt (NMJ) exceptions. 


The read/write ErrorEPC register contains the virtual address at which instruction 
processing can resume after servicing an error. Figure 13-34 shows the format of the 


ErrorEPC register. 
ErrorEPC Register 
63 0 
ErrorEPC | 


64 


Figure 13-34 ErrorEPC Register Format 


14. Floating-Point Unit 


This section describes the operation of the FPU, including the register definitions. 
The Floating-Point unit consists of the following functional units: 

¢ add unit 

¢ multiply unit 

¢ divide unit 

* square-root unit 


The add unit performs floating-point add and subtract, compare, and conversion 
operations. Except for Convert Integer To Single-Precision (float), all operations have a 2- 
cycle latency and a 1|-cycle repeat rate. 


The multiply unit performs single-precision or double-precision multiplication with a 2- 
cycle latency and a 1|-cycle repeat rate. 


The divide and square-root units do single- or double-precision operations. They have 
long latencies and low repeat rates (20 to 40 cycles). 
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14.1 Floating Point Unit Operations 


Primary Register FP Divide 
Instruction Cache Instruction Rename & SQRT 


292 


Branch Cache Rename/ FP Queue 


The floating-point add, multiply, divide, and square-root units read their operands and store 
their results in the floating-point register file. Values are loaded to or stored from the 
register file by the load/store and move units. 


A logic diagram of floating-point operations is shown in Figure 14-1, in which data and 
instructions are read from the secondary cache into the primary caches, and then into the 
processor. There they are decoded and appended to the floating-point queue, passed into 
the FP register file where each is dynamically issued to the appropriate functional unit. 
After execution in the functional unit, results are stored, through the register file, in the 
primary data cache. 


Secondary 
Cache 
(512 Kbyte to 16 Mbyte) 


FP 
Register 
32 Kbyte File pa 
2-way associative (64-by-64) Multiply. 


System Bus 
| <,___—_—___—____—_- 


Refill / Copyback 
32 Kbyte 
2-way associative 


Decode/ Map Primary 


Data 
Branch Active and LES Snity) Cache 
Unit Free Lists 


Branch Address 


Figure 14-1 Logical Diagram of FP Operations 


The floating-point queue can issue one instruction to the adder unit and one instruction to 
the multiplier unit. The adder and multiplier each have two dedicated read ports and a 
dedicated write port in the floating-point register file. 


Because of their low repeat rates, the divide and square-root units do not have their own 
issue port. Instead, they decode instructions issued to the multiplier unit, using its operand 
registers and bypass logic. They appropriate a second cycle later for storing their result. 


When an instruction is issued, up to two operands are read from dedicated read ports in the 
floating-point register file. After the operation has been completed, the result can be written 
back into the register file using a dedicated write port. For the add and multiply units, this 
write occurs four cycles after its operands were read. 
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14.2 Floating-Point Unit Control 


The control of floating-point execution is shared by the following units: 


e The floating-point queue determines operand dependencies and dynamically 
issues instructions to the execution units. It also controls the destination 
registers and register bypass. 


e The execution units control the arithmetic operations and generate status. 


e The graduate unit saves the status until the instructions graduate, and then it 
updates the Floating-Point Status register. 


Eliminate traps for Denorm/NaN FP inputs (R12000) 


The R10000 currently takes Unimplemented Exception when an FPU gets a NaN or 
Denorm as an input. R12000 suppresses these traps whenever the FS bit is set in the FCSR 
(ref. Vp5000, Vp 10000 User’s Manual INSTRUCTION). R12000 simply passes through 
NaN’s and Denorm’s when the bit is set. This change in no way affects the handling of 
QNaNs and Denorms when they are produced, it only changes the way they are handled 
when they are received as input operands. 


Case of Denorm when the FS bit is set to 1: A Denorm received as an input to the FP unit 
is flushed to zero before the FP unit begins to process the operand. The behavior of the unit 
(when FS is 1) will be exactly that seen when the input is zero. Specifically, if the zero input 
would itself cause a trap (due to divide by zero, for example) then the that zero-generated 
trap will be taken. 

When a Denorm is seen at the input, the Inexact bit is set, except in the cases described 
below: 


The Inexact bit will not be set, even if FS=1 and a Denorm is seen on input, if the other 
input to the FP operation is a value which pre-determines the FP result (e.g. QNaN). When 
the result is not affected by the presence or absence of the Denorm input, the result is 
EXACT. Hence the Inexact bit should not be set, even if Flush to Zero mode is ON. 


Case of QNaNs when the FS bit is set to 1: A QNaN received as an input operand for an FP 
unit will cause the unit to produce the standard QNaN (which is not necessarily same as the 
input QNaN). Note that FP units will not propagate the QNaN to the output, but will always 
produce the same, standard, QNaN. 


When the FS bit is set to zero, the behavior will be exactly as in R10000. 
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14.3 Floating-Point General Registers (FGRs) 


32- and 64-Bit Operations 
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63 


Status Bit FR= 0 

Sixteen 64-bit Physical Registers 
Thirty-two 32-bit Logical Registers Thirty-two 64-bit Registers 
(MIPS | and MIPS II compatible) 


32. 31 


The Floating-Point Unit is the hardware implementation of Coprocessor | in the MIPS IV 
Instruction Set Architecture. The MIPS IV ISA defines 32 logical floating-point general 
registers (FGRs), as shown in Figure 14-2. Each FGR is 64 bits wide and can hold either 
32-bit single-precision or 64-bit double-precision values. The hardware actually contains 
64 physical 64-bit registers in the Floating-Point Register File, from which the 32 logical 
registers are taken. 


FP instructions use a 5-bit logical number to select an individual FGR. These logical 
numbers are mapped to physical registers by the rename unit (in pipeline stage 2), before 
the Floating-Point Unit executes them. Physical registers are selected using 6-bit 


addresses. 


The FR bit (26) in the Status register determines the number of logical floating-point 
registers available to the program, and it alters the operation of single-precision load/store 
instructions, as shown in Figure 14-2. 
¢ FR is reset to 0 for compatibility with earlier MIPS I and MIPS IT ISAs, and 
instructions use only the 16 physical even-numbered floating-point registers 
(32 logical registers). Each logical register is 32 bits wide. 
¢ FR is set to | for normal MIPS III and MIPS IV operations, and all 32 of the 
64-bit logical registers are available. 


Status Bit FR=1 


(MIPS III and MIPS IV only) Physical Register 
0 63 0 


63 


FGR = #1 


FGR = #0 FGR = #0 Register #0 


(Register is not implemented.) FGR = #1 | Register #1 


32 31 


63 


FGR = #3 


FGR = #2 FGR = #2 Register #2 


(Register is not implemented.) FGR = #3 Register #3 


32.31 


FGR = #31 


FGR = #30 | FGR = #30 Register #30 


63 0 


(Register is not implemented.) FGR = #31 Register #31 


Figure 14-2 Floating-Point Registers 
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Load and Store Operations 


When FR = 0, floating-point load and stores operate as follows: 


¢ A doubleword load or store is handled the same as if the FR bit was set to 1, 
as long as the register selected is even (0, 2, 4, etc.). 


¢ If the register selected is odd, the load/store is invalid. 
These operations are shown in Figure 14-3. Singleword loads/stores to even and odd 
registers are also shown. 
FR=0 16-Register Mode 


Doubleword Load/Store 
Same as FR=1 if register is even, else invalid. 


Singleword Load/Store when Register is Even Singleword Load/Store when Register is Odd 
31 0 
LWC1 ft,address (MTC1 ft,rs) LWC1 ft,address (MTC1 ft,rs) 
63 32 31 0 63 0 


Unchanged Load 32-bit Load 32-bit 


Unchanged 


SWC1 ft,address (MFC1 rt,fs) SWC1 ft,address (MFC1 rt,fs) 


63, 82 31 638 = ss 0 
tMove to/from selects an integer register instead. tMove to/from selects an integer register instead. 
Moved 32-bit data is sign-extended in 64-bit register. Moved 32-bit data is sign-extended in 64-bit register. 


Figure 14-3 Loading and Storing Floating-Point Registers in 16-Register Mode 


NOTE: Move (MOV) and conditional move (MOVC, MOVN, MOVZ are included 
in these arithmetic operations, although no arithmetic is actually performed. 
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When FR = 1, floating-point load and stores operate as follows: 


¢  Single-precision operands are read from the low half of a register, leaving the 
upper half ignored. Single-precision results are written into the low half of 
the register. The high half of the result register is architecturally undefined; in 
the R10000 implementation, it is set to zero. 

¢ Double-precision arithmetic operations use the entire 64-bit contents of each 


operand or result register. 


Because of register renaming, every new result is written into a temporary register, and 
conditional move instructions select between a new operand and the previous old value. 
The high half of the destination register of a single-precision conditional move instruction 
is undefined (shown in Figure 14-5), even if no move occurs. 


Singleword and doubleword loads and stores with the FPU in 32-register mode (FR=1) are 
shown in Figure 14-4. 


FR=1 32-Register Mode 
Doubleword Load/Store Singleword Load/Store 


31 0 


Memoryt (or 64-bit register) 


zero (dup) 


LDC1 ft,address (DMTC1 ft,rs) 


0 


63 


Load 64-bit Value 


SWC1 ft,address 


SDC1 ft,address (DMFC1 rt,fs) 


0 63 32 31 


63 
Memoryt (or 64-bit register) 


tMove to/from selects an integer register instead. 
Moved 32-bit data is sign-extended in 64-bit register. 


Figure 14-4 Loading and Storing Floating-Point Registers in 32-Register Mode 
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Doubleword load, store and move to/from instructions load or store an entire 64-bit 
floating-point register, as shown in Figure 14-5. 


32-bit Single-Precision 64-bit Double-Precision 
63 32 31 0 


zero 


63 32. 31 0 


Undefined 32-bit Value 


64-bit Result Value 


In MIPS 1 and II ISA, arithmetic operations are valid only for even-numbered registers. 


Figure 14-5 Operators on Floating-Point Registers 


In MIPS I and MIPS IT ISAs, all arithmetic instructions, whether single- or double- 
precision, are limited to using even register numbers. Load, store and move instructions 
transfer only a single word. Even and odd register numbers are used to access the low and 
high halves, respectively, of double-precision registers. When storing a floating-point 
register (SWC1 or MFC1), the processor reads the entire register but writes only the 
selected half to memory or to an integer register. 


Because the register renaming scheme creates a new physical register for every destination, 
it is not sufficient just to enable writing half of the Floating-Point register file when loading 
(LWC1 or MTC1); the unchanged half must also be copied into the destination. This old 
value is read using the shared read port, it is then merged with the new word, and the merged 
doubleword value is written. (A write to the register file writes all 64 bits in parallel.) 


When instructions are renamed in MIPS I or II, the low bit of any FGR field is forced to 
zero. Thus, each even/odd logical register number pair is treated as an even-numbered 
double-precision register. Odd numbered logical registers are not used in the mapping 
tables and dependency logic, but they remain mapped to their latest physical registers. 
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14.4 Floating-Point Control Registers 
The MIPS IV ISA permits up to 32 control registers to be defined for each coprocessor, but 
the Floating-Point Unit uses only two: 
¢ Control register 0, the FP Implementation and Revision register 


¢ Control register 31, the Floating-Point Status register (FSR) 


Floating-Point Implementation and Revision Register 
The following fields are defined for control register 0 in Coprocessor 1, the FP 
Implementation and Revision register, as shown in Figure 14-6: 


¢ The Implementation field holds an 8-bit number, 0x09, which identifies the 
R10000 implementation of the floating point coprocessor. 


¢ The Revision field is an 8-bit number that defines a particular revision of the 
floating point coprocessor. Since it can be arbitrarily changed, it is not 
defined here. 


Implementation and Revision Register 
31 1615 87 0 
0 Imp (0x09) | Rev 


16 8 8 


Figure 14-6 FP Implementation and Revision Register Format 
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Floating-Point Status Register (FSR) 


Figure 14-7 shows the Floating-Point Status register (FSR), control register 31 in 
Coprocessor |. It is implemented in the graduation unit rather than the Floating-Point Unit, 
because it is closely tied to the active list. 


Bits 22:18 are unimplemented and must be set to zero. All other bits may be read or written 
using Control Move instructions from or to Coprocessor | (subfunctions CFC1 or CTC1). 

These move instructions are fully interlocked; they are delayed in the decode stage until all 
previous instructions have been graduated, and no subsequent instruction is decoded until 
they have been completed. 


FP Status Register 


31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 #0 


7161514131211 /FS!0 0 EJViJZ;O;UFJI|VIZ;O;UTI |V]JZ;OyuUy RM 
14147 47 «47 «7 «7°49 5 Pet a ed oe aD 2 
XR SS XR IN rN S 
_Y ea ~~ ~~ ~~ 
Condition Bits 7..0 Cause Enables Flags 


Condition bits are True/False values set by floating-point compare instructions. 
Flush (FS) bit: 0: A denormalized result causes an Unimplemented Operation exception. 
1: A denormalized result is replaced with zero. No exception is flagged. 
Cause bits indicate the status of each floating-point arithmetic instruction. (Not by load, store, or move.) 
Enable bits enable an exception if the corresponding Cause bit is set. 
Flag bits are set whenever the corresponding Cause bit is a1. These bits are cumulative. Once a bit is set, it 
remains set until the FSR is written by a CTC1 instruction. 
E Unimplemented operation. This exception is always enabled. 
IEEE 754 Exception bits: The following bits may be individually enabled: 
V Invalid operation. 
Z Division by zero. (Divide unit only.) 
O Overflow. 
U Underflow. 
/ Inexact operation. (Result can not be stored precisely.) 
Round Mode (RM): (IEEE specification) 
0: AN, Round to nearest representable value. If two values are equally near, 
set the lowest bit to zero. 
1: RZ, Round toward Zero. Round to the closest value whose magnitude is not greater than 
the result. 
2: RP, Round to Plus Infinity. Round to the closest value whose magnitude is not less than 
the result. 
3: RM, Round to Minus Infinity. Round to the closest value whose magnitude is not greater. 


Figure 14-7 Floating-Point Status Register (FSR) 
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Bit Descriptions of the FSR 


Description of the bits in the FSR are as follows: 


Condition Bits [31:25,23]: The Condition bits indicate the result of floating-point compare 
instructions. The active list keeps track of these bits. 


Cause Bits [17:12]: Each functional unit can detect exceptional cases in their function 
codes, operands, or results. These cases are indicated by setting one of six specific Cause 
bits. The Cause bits indicate the status of the floating-point arithmetic instruction which 
graduated most recently or caused an exception to be taken. The FSR is not modified by 
load, store, or move instructions. All cause bits, except E, have corresponding Enable and 
Flag bits in the FSR. 


E  Unimplemented operation: the execution unit does not perform the specified 
operation. This exception is always enabled. 


Invalid operation: this operation is not valid for the given operands. 


Z_ Division by zero: (divide unit only) the result of division by zero is not 
defined. 


O Overflow: the result is too large in magnitude to be correctly represented in 
the result format. 


U__Underflow: the result is too small in magnitude to be correctly represented in 
the result format. 


I Inexact Result: the result cannot be represented exactly. 


NOTE: The FSR is modified only for instructions issued by the floating-point queue. 
Move From (MFC or DMFC) instructions never set the Cause field; status bits from 
the functional unit (multiplier) must be ignored. Move or Move Conditional 
instructions can set the Unimplemented Operation exception only in the Cause field. 
Load and store instructions are issued by the address queue.) 


The functional units generate the Cause bits and send them to the graduation unit when the 
operation is completed. 


Enable Bits [11:7]: The five Enable bits individually enable (when set to a 1) or disable 
(when set to a 0) exceptions when the corresponding Cause bit is set. 


Flag Bits [6:2]: One of the five Flag bits is set when a floating-point arithmetic instruction 
graduates, if the corresponding Cause bit is set. The Flag bits are sticky and remain set 
until the FSR is written. Thus, the Flag bits indicate the status of all floating-point 
instructions graduated since the FSR was last written. The Flag bits are not modified for 
any instructions which cause an exception to be taken. 


Loading the FSR 


Chapter 14 Floating-Point Unit 


Round Mode [1:0]: RM bits select one of the four IEEE rounding modes. Most floating- 
point results cannot be precisely represented by the 32-bit or 64-bit register formats, and 
must be truncated and rounded to a representable value. The modes selected by the RM bit 
values are: 


0: RN, round to nearest representable value. If two values are equally near, set the 
lowest bit to zero. 


1: RZ, round toward zero. Round to the closest value whose magnitude is not greater 
than the result. 


2: RP, round to plus infinity. Round to the closest value whose magnitude is not less 
than the result. 


3: RM, round to minus infinity. Round to the closest value whose magnitude is not 
greater. 


The Round and Enable bits only change when the FSR is written by a CTC1 (Move To 
Coprocessor 1 Control Register) instruction. Each CTC1 instruction is executed 
sequentially, after all previous floating-point instructions have been completed, so these 
FSR bits do not change while any floating-point instruction is active. These bits are 
broadcast from the graduation unit to all the floating-point functional units. 


When a Cause bit is set and its corresponding Enable bit is also set, an exception is taken 
on the instruction. The result of the instruction is not stored, and the Flag bits are not 
changed. If no exception is taken, the corresponding Flag bits are set. 


The Cause and Flag bits may be read or written. If a CTC1 instruction sets both a Cause 
bit and its Enable bit, an exception is taken immediately. The FSR is written, but the 
exception is reported on the move instruction. 


The FSR may be loaded from an integer register by a CTC1 instruction which selects 
control register 31. This instruction is executed serially; that is, it is delayed during decode 
until the entire pipeline has emptied, and it is completed before the next instruction is 
decoded. This instruction writes all F'SR bits. 


If any Cause bit and its corresponding Enable bit are both set, an exception is taken after 
FSR has been modified. The CTC1 instruction is aborted; it does not graduate, even though 
it has changed the processor state. 
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15. Memory Management 


This section describes the R10000 processor memory management, including: 
* processor modes and exceptions 
¢ virtual address space 


e virtual address translation 
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Processor Operating Modes 
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The R10000 has three operating modes and two addressing modes. All are described in this 
section. 


The three operating modes are listed in order of decreasing system privilege: 


¢ Kernel mode (highest system privilege): can access and change any register. 
The innermost core of the operating system runs in kernel mode. 


¢ Supervisor mode: has fewer privileges and is used for less critical sections of 
the operating system. 


¢ User mode (lowest system privilege): prevents users from interfering with one 
another. 


Selection between the three modes can be made by the operating system (when in Kernal 
mode) by writing into Status register’s KSU field. The processor is forced into Kernel mode 
when the processor is handling an error (the ERL bit is set) or an exception (the EXL bit is 
set). Table 15-1 shows the selection of operating modes with respect to the KSU, EXL and 
ERL bits. 


Table 15-1 also shows how different instruction sets and addressing modes are enabled by 
the Status register’s XX, UX, SX and KX bits. A dash ( “-” ) in this table indicates a “don’t 
care.” For detailed information on the address spaces available in each mode, refer to 
section titled, “Virtual Address Space,” in this chapter. 


The R10000 processor was designed for use with the MIPS IV ISA; however, for 
compatibility with earlier machines, the useable ISAs can be limited to either MIPS III or 
MIPSI/II. 


Table 15-1 Processor Modes 


XX | KX| SX | UX| KSU |ERL|EXL cian ISA* | ISA“ Addressing Mode 
31/7/)6/5 4:3 2 1 P Il IV 32-Bit/64-Bit 
0} - - 0 10 0 0 No No 32 
1 - - 0 10 0 0 iseeqnade No Yes 32 
0) - - 1 10 0 0 : Yes No 64 
1 - - 1 10 0 0 Yes Yes 64 
, 7 0 7 o p 2 Supervisor mode Ne Te a2 
mele |) sctl| ONE -o | Oe al 0 P Yes | Yes 64 
- 0 - - 00 0) 0) Pemelmods Yes Yes 32 
- 1 - - 00 0 0 , Yes Yes 64 
- 0) - - - 0) 1 . Yes Yes 32 
: 1 ; i : 0 1 Exception Level Yes Yes 64 
- 0 - - - 1 xX Yes Yes 32 
; 1 Z : ; 1 X Error Level. Yes Yes 64 


+ No means the ISA is disabled; Yes means the ISA is enabled. 


* Dashes (-) are “don’t care.” 
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The processor’s addressing mode determines whether it generates 32-bit or 64-bit memory 
addresses. 


Refer to Table 15-1 for the following addressing mode encodings: 


¢ In Kernel mode the KX bit allows 64-bit addressing; all instructions are 
always valid. 


¢ In Supervisor mode, the SX bit allows 64-bit addressing and the MIPS HI 


instructions. MIPS IV ISA is enabled all the time in Supervisor mode. 


¢ In User mode, the UX bit allows 64-bit addressing and the MIPS III 
instructions; the XX bit allows the new MIPS IV instructions. 


15.2 Virtual Address Space 


The processor uses either 32-bit or 64-bit address spaces, depending on the operating and 
addressing modes set by the Status register. Table 15-1 lists the decoding of these modes. 


The processor uses the following addresses: 
¢ virtual address VA[43:0] 
* region bits VA[63:59] 


If a region is mapped, virtual addresses are translated in the TLB. Bits VA[58:44] are not 
translated in the TLB and are sign extensions of bit VA[43]. 


In both 32-bit and 64-bit address mode, the memory address space is divided into many 
regions, as shown in Figure 15-3. Each region has specific characteristics and uses. The 
user can access only the useg region in 32-bit mode, or xuseg in 64-bit mode, as shown in 
Figure 15-1. The supervisor can access user regions as well as sseg (in 32-bit mode) or 
xsseg and csseg (in 64-bit mode), shown in Figure 15-2. The kernel can access all regions 
except those restricted because bits VA[58:44] are not implemented in the TLB, as shown 
in Figure 15-3. 


The R10000 processor follows the R4400 implementation for data references only, 
ensuring compatibility with the NT kernel. If any of the upper 33 bits are nonzero for an 
instruction fetch, an Address Error is generated. Refer to Table 15-2 for delineation of the 
address spaces. 
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User Mode Operations 


In User mode, a single, uniform virtual address space—labelled User segment—is 
available; its size is: 

e 2 Gbytes Oo bytes) in 32-bit mode (useg) 

* 16 Tbytes (24 bytes) in 64-bit mode (xuseg) 


Figure 15-1 shows User mode virtual address space. 


32-bit 64-bit 
KSU = 105 and KSU = 105 and 
EXL = 0 and EXL = 0 and 
ERL = 0 and ERL = 0 and 
UX =0 UX =1 


Ox FFFF FFFF Ox FFFF FFFF FFFF FFFF 


Address 
Error 


Ox 8000 0000 


Ox 0000 1000 0000 0000 
Ox JEFF FFFF 


Ox 0000 OFFF FFFF FFFF 


16 Tbytes 
oe Mapped 


xuseg 


Ox 0000 0000 Ox 0000 0000 0000 0000 


Figure 15-1 | User Mode Virtual Address Space 
The User segment starts at address 0 and the current active user process resides in either 


useg (in 32-bit mode) or xuseg (in 64-bit mode). The TLB identically maps all references 
to useg/xuseg from all modes, and controls cache accessibility. 
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32-bit User Mode (useg) 


In User mode, when UX = 0 in the Status register, User mode addressing is compatible with 
the 32-bit addressing model shown in Figure 15-1, and a 2-Gbyte user address space is 
available, labelled useg. 


All valid User mode virtual addresses have their most-significant bit cleared to 0; any 
attempt to reference an address with the most-significant bit set while in User mode causes 
an Address Error exception. 


The system maps all references to useg through the TLB, and bit settings within the TLB 
entry for the page determine the cacheability of a reference. 


64-bit User Mode (xuseg) 


In User mode, when UX =1 in the Status register, User mode addressing is extended to the 
64-bit model shown in Figure 15-1. In 64-bit User mode, the processor provides a single, 


uniform virtual address space of on bytes, labelled xuseg. 


All valid User mode virtual addresses have bits 63:44 equal to 0; an attempt to reference an 
address with bits 63:44 not equal to 0 causes an Address Error exception. 


Although the system may be in 32-bit mode, address logic still generates 64-bit values. In 
this case the high 32 bits must equal the sign bit (31), or an Address Error exception is 
taken. 
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Supervisor Mode Operations 


Supervisor mode is designed for layered operating systems in which a true kernel runs in 
processor Kernel mode, and the rest of the operating system runs in Supervisor mode. 


The processor operates in Supervisor mode when the Status register contains the 
Supervisor-mode bit-values shown in Table 15-1. 


Figure 15-2 shows Supervisor mode address mapping. 


64-bit 


KSU = 01 and 
EXL = 0 and 
ERL = 0 and 
SX = 1 


Ox FFFF FFFF FFFE FEFF 


32-bit Ox FFFF FFFF E000 0000 
Ox FFFF FFFF DFFF FFFF 
KSU = 01 and 0.5 Gbytes csseg 
EXL = 0 and Mapped 
ERL =0 and Ox FFFF FFFF C000 0000 
SX =0 Ox FFFF FFFF BFFF FFFF 
Ox FFFF 
0x 4000 1000 0000 0000 
Ox 4000 OFFF FFFF FFFF 
Ox E000 
Ox DFFF 16 Tbytes 
0.5 Gbytes sseg Mapped xsseg 


Mapped 


Ox C000 
Ox BFFF 


Ox 4000 0000 0000 0000 
Ox 3FFF FFFF FFFF FFFF 


Ox 8000 Address Error if UX=0 


Ox 7FFF 


Ox 0000 1000 0000 0000 
Ox 0000 OFFF FFFF FFFF 


0x 0000 0000 8000 ooo0| _ 16 Tbytes es 
suseg 0x 0000 0000 7FFF FFFF 


Ox 0000 0x 0000 0000 0000 0000 


Figure 15-2. Supervisor Mode Address Space 


32-bit Supervisor Mode, User Space (suseg) 


In Supervisor mode, when SX = 0 in the Status register and the most-significant bit of the 
32-bit virtual address is set to 0, the suseg virtual address space is selected; it covers the full 
ye bytes (2 Gbytes) of the current user address space. The virtual address is extended with 
the contents of the 8-bit ASID field to form a unique virtual address. 


This mapped space starts at virtual address 0x0000 0000 and runs through 0x7FFF FFFF. 
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32-bit Supervisor Mode, Supervisor Space (sseg) 


In Supervisor mode, when SX = 0 in the Status register and the three most-significant bits 
of the 32-bit virtual address are 1105, the sseg virtual address space is selected; it covers 
27°-bytes (512 Mbytes) of the current supervisor address space. The virtual address is 
extended with the contents of the 8-bit ASID field to form a unique virtual address. 


This mapped space begins at virtual address 0OxC000 0000 and runs through OxDFFF FFFF. 


64-bit Supervisor Mode, User Space (xsuseg) 


In Supervisor mode, when SX = | in the Status register and bits 63:62 of the virtual address 
are set to 005, selection of the xsuseg virtual address space is dependent upon the UX bit. 


e« if UX = 1, the entire space from 0x0000 0000 0000 0000 through 0000 OFFF 
FFFF FFFF (16 Tbytes) is selected. 


¢ If UX =0, the address space 0x0000 0000 0000 0000 through 0000 0000 
7FFF FFFF (2 Gbytes) is selected. Addressing the space ranging from 0000 
0000 8000 0000 through 0000 OFFF FFFF FFFF will cause an address error. 


The virtual address is extended with the contents of the 8-bit ASID field to form a unique 
virtual address. 


64-bit Supervisor Mode, Current Supervisor Space (xsseg) 


In Supervisor mode, when SX = | in the Status register and bits 63:62 of the virtual address 
are set to 015, the xsseg current supervisor virtual address space is selected. The virtual 
address is extended with the contents of the 8-bit ASID field to form a unique virtual 
address. 


This mapped space begins at virtual address 0x4000 0000 0000 0000 and runs through 
0x4000 OFFF FFFF FFFF. 


64-bit Supervisor Mode, Separate Supervisor Space (csseg) 


In Supervisor mode, when SX = | in the Status register and bits 63:62 of the virtual address 
are set to 11,, the csseg separate supervisor virtual address space is selected. Addressing 
of the csseg is compatible with addressing sseg in 32-bit mode. The virtual address is 
extended with the contents of the 8-bit ASID field to form a unique virtual address. 


This mapped space begins at virtual address 0xFFFF FFFF C000 0000 and runs through 
OxFFFF FFFF DFFF FFFF. 
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Kernel Mode Operations 


The processor operates in Kernel mode when the Status register contains the Kernel-mode 
bit-values shown in Table 15-1. 


Kernel mode virtual address space is divided into regions differentiated by the high-order 
bits of the virtual address, as shown in Figure 15-3. 


64-bit 


(KSU = 00 or EXL = 1 or ERL = 1) 


and KX = 1 
0.5 Gbytes 
Mapped ckseg3 
cksseg 
0.5 Gbytes 
Unmapped cksegl 
: FFFF FFFF a000 0000| Uncached 
32-bit FFFF FFFE FFF FFFF oo Gbyies 
nmappe ksegO 
Cached moe, 
(KSU = 00 or EXL = 1 or ERL = 1) 
and KX = 0 
Ox FFFF FFFF 
0.5 Gbytes 
Mapped kseg3 
0x E000 0000 
xkseg 
0.5 Gbytes 
0x C000 0000 Mapped ksseg 
Ox BFFF FFFFI” g 5 Gbytes xkphys 


Unmapped ksegl 
ox aooo ooo0o| Uncached 


Ox OFFF FFFF 0.5 Gbytes 


Unmapped ksegO 
0x 8000 0000 Cached 
Ox 7FFF FFFF 


Address Error if SX=0 


xksseg 


2 Gbytes Tee Address Error if UX=0 
ERL = 1 
Mapped | ‘a 
16 Tbytes _ xkuseg 


~ “Mapped 


(See Note below) 


Ox 0000 0000 


Figure 15-3 Kernel Mode Address Space 


NOTE: If ERL = 1, the selected 2 Gbyte space becomes uncached and unmapped. 
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32-bit Kernel Mode, User Space (kuseg) 


In Kernel mode, when KX = 0 in the Status register, and the most-significant bit of the 
virtual address, A31, is cleared, the 32-bit kuseg virtual address space is selected; it covers 
the full 23! bytes (2 Gbytes) of the current user address space. The virtual address is 
extended with the contents of the 8-bit ASID field to form a unique virtual address. 


32-bit Kernel Mode, Kernel Space 0 (kseg0) 


In Kernel mode, when KX = 0 in the Status register and the most-significant three bits of 
the virtual address are 1005, 32-bit ksegO virtual address space is selected; it is the 279_byte 
(512-Mbyte) kernel physical space. References to kseg0 are not mapped through the TLB; 
the physical address is selected by subtracting 0x8000 0000 from the virtual address. The 
KO field of the Config register determines cacheability and coherency. 


32-bit Kernel Mode, Kernel Space 1 (kseg/) 


In Kernel mode, when KX = 0 in the Status register and the most-significant three bits of 
the 32-bit virtual address are 1015, 32-bit kseg/ virtual address space is selected; it is the 


27°_byte (512-Mbyte) kernel physical space. 


References to kseg/ are not mapped through the TLB; the physical address is selected by 
subtracting 0xA000 0000 from the virtual address. 


Caches are disabled for accesses to these addresses, and physical memory (or memory- 
mapped I/O device registers) are accessed directly. 
32-bit Kernel Mode, Supervisor Space (ksseg) 


In Kernel mode, when KX = 0 in the Status register and the most-significant three bits of 
the 32-bit virtual address are 1105, the ksseg virtual address space is selected; it is the 
current 27°-byte (512-Mbyte) supervisor virtual space. The virtual address is extended with 
the contents of the 8-bit ASID field to form a unique virtual address. 


References to ksseg are mapped through the TLB. 


32-bit Kernel Mode, Kernel Space 3 (kseg3) 


In Kernel mode, when KX = 0 in the Status register and the most-significant three bits of 
the 32-bit virtual address are 1115, the kseg3 virtual address space is selected; it is the 
current 27°-byte (512-Mbyte) kernel virtual space. The virtual address is extended with the 
contents of the 8-bit ASID field to form a unique virtual address. 


References to kseg3 are mapped through the TLB. 
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64-bit Kernel Mode, User Space (xkuseg) 


In Kernel mode, when KX = | in the Status register and bits 63:62 of the 64-bit virtual 
address are 005, selection of the xkuseg virtual address space is dependent upon the UX and 
ERL bits. 


¢ if UX=1 and ERL =0, the entire space from 0x0000 0000 0000 0000 through 
0000 OFFF FFFF FFFF (16 Tbytes) is selected. 


¢ If UX=0 or ERL = 1, the address space 0x0000 0000 0000 0000 through 
0000 0000 7FFF FFFF (2 Gbytes) is selected. Addressing the space ranging 
from 0000 0000 8000 0000 through 0000 OFFF FFFF FFFF will cause an 
address error. Moreover, if ERL=1, the selected 2-Gbyte address space 
becomes unmapped and uncached. 


The virtual address is extended with the contents of the 8-bit ASID field to form a unique 
virtual address. 


64-bit Kernel Mode, Current Supervisor Space (xksseg) 


In Kernel mode, when KX = 1 in the Status register and bits 63:62 of the 64-bit virtual 
address are 015, selection of the xksseg virtual address space is dependent upon the SX bit. 


e if SX = 1, the entire space from 0x4000 0000 0000 0000 through 4000 OFFF 
FFFF FFFF (16 Tbytes) is selected. 


e If SX =0, access to any address in the space ranging from 0x4000 0000 0000 
0000 through 4000 OFFF FFFF FFFF causes an address error. 


The virtual address is extended with the contents of the 8-bit ASID field to form a unique 
virtual address. 


64-bit Kernel Mode, Physical Spaces (xkphys) 


In Kernel mode, when KX = 1 in the Status register and bits 63:62 of the 64-bit virtual 
address are 10>, the xkphys virtual address space is selected; it is a set of eight kernel 
physical spaces. Each kernel physical space contains either one or four 24°_byte physical 


pages. 


References to this space are not mapped; the physical address selected is taken directly 
from bits 39:0 of the virtual address. Bits 61:59 of the virtual address specify the cache 
algorithm, described in Chapter 4, the section titled “Cache Algorithms.” If the cache 
algorithm is either uncached or uncached accelerated (values of 2 or 7) the space contains 
four physical pages; access to addresses whose bits 56:40 are not equal to 0 cause an 
Address Error exception. Address bits 58:57 carry the uncached attribute (described in 
Chapter 6, the section titled “Support for Uncached Attribute”), and are not checked for 
address errors. 


If the cache algorithm is neither uncached nor uncached accelerated, the space contains a 
single physical page, as on the R4400 processor. In this case, access to addresses whose 
bits 58:40 are not equal to a zero cause an Address Error exception, as shown in Figure 15- 
4. 
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OX BFFFFFFF FFFFFFFF OX 9OFFFFFFF FFFFFFFF 
Address Error Address Error 
OX BE000100 00000000 0x 98000100 00000000 
OX BEOOOOFF FFFFFFFF OX 980000FF FFFFFFFF 
Uncached Accelerated Cacheable Noncoherent 
OX BE000000 00000000 0X 98000000 0000000 
OX BDFFFFFF FFFFFFFF OX 97FFFFFF FFFFFFFF 
Address Error Address Error 
ox BC000100 00000000 0x 96000100 00000000 
OX BCOOOOFF FFFFFFFF OX 960000FF FFFFFFFF 
Uncached Accelerated Uncached 
0x BC000000 00000000 0X 96000000 00000000 
OX BBFFFFFF FFFFFFFF OX 95FFFFFF FFFFFFFF 
Address Error Address Error 
0X BA000100 00000000 0x 94000100 00000000 
OX BAOOOOFF FFFFFFFF OX 940000FF FFFFFFFF 
Uncached Accelerated Uncached 
OX BAO000000 00000000 0x 94000000 00000000 
OX BOFFFFFF FFFFFFFF OX 93FFFFFF FFFFFFFF 
Address Error Address Error 
0x B8000100 00000000 0x 92000100 00000000 
OX BSOOOOFF FFFFFFFF OX 920000FF FFFFFFFF 
Uncached Accelerated Uncached 
0x B8000000 00000000 0x 92000000 00000000 
OX BJ FFFFFF FFFFFFFF OX 91FFFFFF FFFFFFFF 
Address Error Address Error 
0x BO0000100 00000000 0x 90000100 00000000 
OX BOOOOOFF FFFFFFFF + OX 9O0000FF FFFFFFFF 
Reserved Uncached 
0x BO000000 00000000 0x 90000000 00000000 
OX AFFFFFFF FFFFFFFF OX 8FFFFFFF FFFFFFFF 
Address Error Address Error 
0x A8000100 00000000 0X 88000100 00000000 
OX ASOOO0OOFF FFFFFFFF ; i OX 880000FF FFFFFFFF ‘ 
Cacheable Exclusive Write Reserved 
ox A8000000 00000000 0X 88000000 00000000 
OX ATFFFFFF FFFFFFFF OX 8S7FFFFFF FFFFFFFF 
Address Error Address Error 
ox A0000100 00000000 0x 80000100 00000000 
OX AQDOQOOOOFF FFFFFFFF . OX SOO0OO0D0OFEF FFFFFFFF * 
Se PAO Tera Oe Cacheable Exclusive 0x 80000000 00000000 Reserved 


£ Accessing a reserved space results in undefined behavior. 


Figure 15-4 xkphys Virtual Address Space 
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64-bit Kernel Mode, Kernel Space (xkseg) 


In Kernel mode, when KX = | in the Status register and bits 63:62 of the 64-bit virtual 
address are 11,5, the address space selected is one of the following: 


kernel virtual space, xkseg, the current kernel virtual space; the virtual address 
is extended with the contents of the 8-bit ASID field to form a unique virtual 
address 


one of the four 32-bit kernel mode compatibility spaces (described below). 


64-bit Kernel Mode, Compatibility Spaces (ckseg1:0, cksseg, ckseg3) 


In Kernel mode, when KX = 1 in the Status register, bits 63:62 of the 64-bit virtual address 
are 115, and bits 61:31 of the virtual address equal —1, the lower two bytes of address, as 
shown in Figure 15-3, select one of the following 512-Mbyte compatibility spaces. 


cksegO. This 64-bit virtual address space is an unmapped region, compatible 
with the 32-bit address model kseg0. The KO field of the Config register 
controls cacheability and coherency. 


cksegl. This 64-bit virtual address space is an unmapped and uncached 
region, compatible with the 32-bit address model kseg/. 


cksseg. This 64-bit virtual address space is the current supervisor virtual 
space, compatible with the 32-bit address model ksseg. 


ckseg3. This 64-bit virtual address space is kernel virtual space, compatible 
with the 32-bit address model kseg3. 


Address Space Access Privilege Differences Between the R4400 and R10000 


In the R4400, the 64-bit Supervisor mode can access the entire xsuseg space, and the 64-bit 
Kernel mode can access the entire xksseg and xkuseg spaces. Access privileges in the 
R10000 are also dependent on the UX and SX bits: 


Access to the 64-bit user space in 64-bit Supervisor or Kernel mode (xsuseg 
or xkuseg) is controlled by the UX bit. If UX=0, the 64-bit Supervisor and 
Kernel modes can only access the 32-bit user space (suseg or kuseg). 


Access to the 64-bit supervisor space in Kernel mode (xksseg) is controlled by 
the SX bit. If SX=0, the 64-bit Kernel mode can only access the 32-bit 
supervisor space (ksseg). 


An Address Error exception is taken on an illegal access. 


The R10000 processor implements the same access privileges for 32-bit processor modes 
as in the R4400. The Table 15-2 summarizes the access privileges for all processor modes 
in the R10000 processor. 
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Table 15-2 Access Privileges for User, Supervisor and Kernel Mode Operations 


64-bit 


Virtual 


Address 


32-bit Mode 


64-bit Mode 


User* 


Supervisor 


Kernel 


User 


Supervisor 


Kernel & 
ERL=0 


Kernel & 
ERL=1 


FFEFFFFF 


E0000000 


TO 


FEEEEFEE 


FREEEEEE 


FFFFFFFF C0Q000000 


TO 


FEEEEFEE 


DFFFFFFE 


FFFFFFFF AQ000000 


TO 


FEREEEFFEE 


FFFFFFFF 80000000 
TO 


BFFFFFFE 


FEEEEPEFE 


OFFFFEFE 


COOOOFFF 00000000 


TO 


FEEEEFEE 


TEFEFFFFE 


C0000000 


00000000 


TO 


COOOOFFE FFFFEFFFF 


80000000 


00000000 


TO 


BEFEFEFEFE 


FEREEEEEEE 


40001000 


00000000 


TO 


7EFFFFFE 


FEREEEEEEE 


40000000 


00000000 


TO 


40000FFF FFFFFFFF 


00001000 


00000000 


TO 


3FFFFFFE 


FEREEEEEEE 


00000000 


80000000 


TO 


OOOOOFFF FFFFFFFF 


AddrErr 


AddrErr 


OK 


AddrErr 


OK 


AddrErr 


AddrErr 


AddrErr 


OK 


AddrErr 


OK 


OK 


AddrErr 


AddrErr 


OK 


OK 


OK 


OK 


AddrErr 


AddrErr 


OK 


AddrErr 


AddrErr if 
SX=0 


AddrErr 


AddrErr if 
SX=0 


AddrErr 


00000000 


00000000 


00000000 


TO 


7EFFEFFF 


OK 


OK 


OK 


OK 


AddrErr if 
UX=0 


AddrErr if 
UX=0 


AddrErr 


OK 


OK 


For data references, the upper 32 bits of the virtual addresses are cleared before checking access privilege and TLB translation. 
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15.3 Virtual Address Translation 


Errata 


Virtual Pages 


Programs can operate using either physical or virtual memory addresses: 
e physical addresses correspond to hardware locations in main memory 


e virtual addresses are logical values only, and do not correspond to fixed 
hardware locations 


Virtual addresses must first be translated (finding the physical address at which the virtual 
address points) before main memory can be accessed. This translation is essential for 
multitasking computer systems, because it allows the operating system to load programs 
anywhere in main memory independent of the logical addresses used by the programs. 


This translation also implements a memory protection scheme, which limits the amount of 
memory each program may access. The scheme prevents programs from interfering with 
the memory used by other programs or the operating system. 


Translated virtual addresses retrieve data in blocks, which are called pages. In the R10000 
processor, the size of each page may be selected from a range that runs from 4 Kbytes to 
16 Mbytes inclusive, in powers of 4 (that is, 4 Kbytes, 16 Kbytes, 64 Kbytes, etc.). 


The virtual address bits which select a page (and thus are translated) are called the page 
address. The lower bits which select a byte within the selected page are called the offset 
and are not translated. The number of offset bits varies from 12 to 24 bits, depending on 
the page size. 


Virtual Page Size Encodings 
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Page size is defined in each TLB entry’s PageMask field. This field is loaded or read using 
the PageMask register, as described in Chapter 13, PageMask Register (5). 


Each entry translates a pair of physical pages. The low bit of the virtual address page is not 
compared, because it is used to select between these two physical pages. 


Using the TLB 


Cache Algorithm Field 


Format of a TLB Entry 
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Translations are maintained by the operating system, using page tables in memory. A 
subset of these translations are loaded into a hardware buffer called the translation- 
lookaside buffer or TLB. The contents of this buffer are maintained by the operating 
system; if an instruction needs a translation which is not already in the buffer, an exception 
is taken so the operating system can compute and load the needed translation. If all the 
necessary translations are present, the program is executed without any delays. 


The TLB contains 64 entries, each of which maps a pair of virtual pages. Formats of TLB 
entries are shown in Figure 15-5. 


The Cache Algorithm fields of the TLB, EntryLo0, EntryLo1, and Config registers indicate 
how data is cached. Cache algorithms are described in Chapter 4, Cache Algorithms. 


Figure 15-5 shows the TLB entry formats for both 32- and 64-bit modes. Each field of an 
entry has a corresponding field in the EntryHi, EntryLo0, EntryLo1, or PageMask registers, 
as shown in Chapter 13, Coprocessor 0; for example the PFN and uncached attribute (UC) 
fields of the TLB entry are also held in the EntryLo registers. 


255 217 216 205 204 192 
0 MASK 0 
39 12 13 
191 190189 172 171 141 140139136 135 128 
R 0 VPN2 G| 0 ASID 
2 18 31 T 4 8 
127125 98 97 70 69 67666564 
UC 0 PFN C |D/VIO 
2 30 28 a to 4 
63 61 34 33 65 321 0 
UC 0 PFN Cc |D/ vio 
28 a ei 


Figure 15-5 Format of a TLB Entry 
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Because a 64-bit address is unnecessarily large, only the low 44 address bits are translated. 
The high two virtual address bits (bits 63:62) select between user, supervisor, and kernel 
address spaces. The intermediate address bits (61:44) must either be all zeros or all ones, 
depending on the address region. The TLB does not include virtual address bits 61:59, 
because these are decoded only in the xkphys region, which is unmapped. 


For data cache accesses, the joint TLB (JTLB) translates addresses from the address 
calculate unit. For instruction accesses, the JTLB translates the PC address if it misses in 
the instruction TLB (ITLB). That entry is copied into the ITLB for subsequent accesses. 
The ITLB is transparent to system software. 


Address Space Identification (ASID) 


Global Processes (G) 


Avoiding TLB Conflict 
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Each independent task, or process, has a separate address space, assigned a unique 8-bit 
Address Space Identifier (ASID). This identifier is stored with each TLB entry to 
distinguish between entries loaded for different processes. The ASID allows the processor 
to move from one process to another (called a context switch) without having to invalidate 
TLB entries. 


The processor’s current ASID is stored in the low 8 bits of the EntryHi register. These bits 
are also used to load the AS/D field of an entry during TLB refill. 


The ASID field of each TLB entry is compared to the EntryHi register; if the ASIDs are 
equal or if the entry is global (see below), this TLB entry may be used to translate virtual 
addresses. The ASID comparison is performed only when a new value is loaded into the 
EntryHi register; the one-bit result of the match is stored in a static Enable latch. (This bit 
is set whenever a new entry is loaded.) 


A translation may be defined as global so that it can be shared by all processes. This G bit 
is set in the TLB entry and enables the entry independent of its ASID value. 


Setting the TS bit in the Status register indicates an entry being presented to the TLB 
matches more than one virtual page entry in the TLB. Any TLB entries that allow multiple 
matches, even in the Wired area, are invalidated before the new entry can be written into the 
TLB. This prevents multiple matches during address translation. 


16. CPU Exceptions 


This chapter describes the processor exceptions—a general view of the cause and return of 
an exception, exception vector locations, and the types of exceptions that are supported, 
including the cause, processing, and servicing of each exception. 
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16.1 Causing and Returning from an Exception 


When the processor takes an exception, the EXL bit in the Status register is set to 1, which 
means the system is in Kernel mode. After saving the appropriate state, the exception 
handler typically changes the KSU bits in the Status register to Kernel mode and resets the 
EXL bit back to 0. When restoring the state and restarting, the handler restores the previous 
value of the KSU field and sets the EXL bit back to 1. 


Returning from an exception also resets the EXL bit to 0 (see the ERET instruction in 
VR5000, Vp10000 User’s Manual INSTRUCTION). 


16.2 Exception Vector Locations 
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The Cold Reset, Soft Reset, and NMI exceptions are always vectored to the dedicated Cold 
Reset exception vector at an uncached and unmapped address. Addresses for all other 
exceptions are a combination of a vector offset and a base address. 


The boot-time vectors (when BEV = | in the Status register) are at uncached and unmapped 
addresses. During normal operation (when BEV = 0) the regular exceptions have vectors 

in cached address spaces; Cache Error is always at an uncached address so that cache error 
handling can bypass a suspect cache. 


The exception vector assignments for the R10000 processor shown in Table 16-1; the 
addresses are the same as for the R4400. 


Table 16-1 Exception Vector Addresses 


: Exception Vector Address 
BEV Exception Type 
32-bit 64-bit 

Cold Reset/Soft Reset/ 

OxBFCO0000 OxFFFFFFFF BFC00000 
NMI 

TLB Refill (EXL=0) 0x80000000 OxFFFFFFFF 80000000 
Bey XTLB Refill (EXL=0) 0x80000080 OxFFFFFFFF 80000080 
a Cache Error 0xA0000100 | OxFFFFFFFF A0000100 
Others 0x80000180 OxFFFFFFFF 80000180 
TLB Refill (EXL=0) 0xBFC00200 OxFFFFFFFF BFC00200 
ees XTLB Refill (EXL=0) 0xBFC00280 OxFFFFFFFF BFC00280 
> Cache Error OxBFC00300 | OxFFFFFFFF BFC00300 
Others 0xBFC00380 OxFFFFFFFF BFC00380 
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16.3 TLB Refill Vector Selection 


In all present implementations of the MIPS III ISA, there are two TLB refill exception 
vectors: 


e one for references to 32-bit address space (TLB Refill) 
* one for references to 64-bit address space (XTLB Refill) 


Table 16-2 lists the exception vector addresses. 


The TLB refill vector selection is based on the address space of the address (user, 
supervisor, or kernel) that caused the TLB miss, and the value of the corresponding 
extended addressing bit in the Status register (UX, SX, or KX). The current operating mode 
of the processor is not important except that it plays a part in specifying in which address 
space an address resides. The Context and XContext registers are entirely separate page- 
table-pointer registers that point to and refill from two separate page tables, however these 
two registers share BadVPN2 fields (see Chapter 13 for more information). For all TLB 
exceptions (Refill, Invalid, TLBL or TLBS), the BadVPN2 fields of both registers are 
loaded as they were in the R4400. 


In contrast to the R10000, the R4400 processor selects the vector based on the current 
operating mode of the processor (user, supervisor, or kernel) and the value of the 
corresponding extended addressing bit in the Status register (UX, SX or KX). In addition, 
the Context and XContext registers are not implemented as entirely separate registers; the 
PTEbase fields are shared. A miss to a particular address goes through either TLB Refill or 
XTLB Refill, depending on the source of the reference. There can be only be a single page 
table unless the refill handlers execute address-deciphering and page table selection in 
software. 


NOTE: Refills for the 0.5 Gbyte supervisor mapped region, sseg/ksseg, are controlled 
by the value of KX rather than SX. This simplifies control of the processor when 
supervisor mode is not being used. 
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Table 16-2 lists the TLB refill vector locations, based on the address that caused the TLB 
miss and its corresponding mode bit. 


Table 16-2. TLB Refill Vectors 


Space Address Range Regions Exception Vector 
OxFFFF FFFF E000 0000 Refill (KX=0) 
to or 
Kernel OxFFFF FFFF FFFF FFFF | ‘seg3 XRefill (KX=1) 
OxFFFF FFFF C000 0000 Refill (KX=0) 
: to or 
Supervisor | oxFFFF FFFF DFFF FFFF | 58¢8 ksseg XRefill (KX=1) 
0xCO000 0000 0000 0000 
CO . 
Kernel S26000 Cee EER SEE xkseg XRefill(KX=1) 
0x4000 0000 0000 0000 
: CO . 
Supervisor 0x4000 OFFF FFFF FFFF | ™5¢8 xksseg XRefill (SX=1) 
0x0000 0000 8000 0000 
CO XSUSE, XUSE, = 
User 0x0000 OFFF FFFF FFFF | vkyseg XRefill (UX=1) 
0x0000 0000 0000 0000 | useg, xuseg, suseg, Refill (UX=0) 
U to xsuseg, kuseg, or 
SEE 0x0000 0000 7FFF FFEF xkuseg XRefill (UX=1) 


Priority of Exceptions 


Chapter 16 CPU Exceptions 


The remainder of this chapter describes exceptions in the order of their priority shown in 
Table 16-3 (with certain of the exceptions, such as the TLB exceptions and Instruction/Data 
exceptions, grouped together for convenience). While more than one exception can occur 


for a single instruction, only the exception with the highest priority is reported. Some 


exceptions are not caused by the instruction executed at the time, and some exceptions may 
be deferred. See the individual description of each exception in this chapter for more detail. 


Table 16-3 Exception Priority Order 


Cold Reset (highest priority) 


Soft Reset 


Nonmaskable Interrupt (NMI)* 


Cache error — Instruction cache* 


Cache error — Data cache* 


z 
Cache error — Secondary cache* 


Cache error — System interface* 


Address error — Instruction fetch 


TLB refill — Instruction fetch 


TLB invalid — Instruction fetch 


Bus error — Instruction fetch 


Integer overflow, Trap, System Call, Breakpoint, Reserved Instruction, 
Coprocessor Unusable, or Floating-Point Exception 


Address error — Data access 


TLB refill — Data access 


TLB invalid — Data access 


TLB modified — Data write 


Watch? 


Bus error — Data access 


Interrupt (lowest priority ) 


£ These exceptions are interrupt types, and may be imprecise. Priority may not be followed when 


considering a specific instruction. 


Generally speaking, the exceptions described in the following sections are handled 
(“processed”’) by hardware; these exceptions are then serviced by software. 
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Cold Reset Exception 


Cause 
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The Cold Reset exception is taken for a power-on or “cold” reset; it occurs when the 
SysGnt* signal is asserted while the SysReset* signal is also asserted.’ This exception is 
not maskable. 


Processing 


The CPU provides a special interrupt vector for this exception: 


location OxBFCO 0000 in 32-bit mode 
location OxFFFF FFFF BFCO 0000 in 64-bit mode 


The Cold Reset vector resides in unmapped and uncached CPU address space, so the 
hardware need not initialize the TLB or the cache to process this exception. It also means 


the processor can fetch and execute instructions while the caches and virtual memory are 


in an undefined state. 


The contents of all registers in the CPU are undefined when this exception occurs, except 
for the following register fields: 


Servicing 


In the Status register, SR and TS are cleared to 0, and ERL and BEV are set to 
1. All other bits are undefined. 


Config register is initialized with the boot mode bits read from the serial input. 
The Random register is initialized to the value of its upper bound. 

The Wired register is initialized to 0. 

The EW bit in the CacheErr register is cleared. 

The ErrorEPC register gets the PC. 

The FrameMask register is set to 0. 

Branch prediction bits are set to 0. 

Performance Counter register Event field is set to 0. 


All pending cache errors, delayed watch exceptions, and external interrupts 
are cleared. 


The Cold Reset exception is serviced by: 


initializing all processor registers, coprocessor registers, caches, and the 
memory system 


performing diagnostic tests 


bootstrapping the operating system 


+ If SysGnt* remains deasserted (high) while SysReset* is asserted, the processor interprets 
this as a Soft Reset exception. 
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Cause 


Processing 
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The Soft Reset exception occurs in response to a Soft Reset (See Chapter 8, the section 
titled “Soft Reset Sequence”). 


A Soft Reset exception is not maskable. 


The processor differentiates between a Cold Reset and a Soft Reset as follows: 


e A Cold Reset occurs when the SysGnt* signal is asserted while the SysReset* 
signal is also asserted. 


¢ A Soft Reset occurs if the SysGnt* signal remains negated when a SysReset* 
signal is asserted. 


In R4400 processor, there is no way for software to differentiate between a Soft Reset 
exception and an NMI exception. In the R10000 processor, a bit labelled NMI has been 
added to the Status register to distinguish between these two exceptions. Both Soft Reset 
and NMI exceptions set the SR bit and use the same exception vector. During an NMI 
exception, the NMI bit is set to 1; during a Soft Reset, the NMI bit is set to 0. 


When a Soft Reset exception occurs, the SR bit of the Status register is set, distinguishing 
this exception from a Cold Reset exception. 


When a Soft Reset is detected, the processor initializes minimum processor state. This 
allows the processor to fetch and execute the instructions of the exception handler, which 
in turn dumps the current architectural state to external logic. Hardware state that loses 
architectural state is not initialized unless it is necessary to execute instructions from 
unmapped uncached space that reads the registers, TLB, and cache contents. 


The Soft Reset can begin on an arbitrary cycle boundary and can abort multicycle 
operations in progress, so it may alter machine state. Hence, caches, memory, or other 
processor states can be inconsistent: data cache blocks may stay at the refill state and any 
cached loads/stores to these blocks will hang the processor. Therefore, CacheOps should be 
used to dump the cache contents. 


After the processor state is read out, the processor should be reset with a Cold Reset 
sequence. 


+ Soft Reset is also known colloquially as Warm Reset. 
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A Soft Reset exception preserves the contents of all registers, except for: 
¢  ErrorEPC register, which contains the PC 
¢ ERL bit of the Status register, which is set to 1 


e SR bit of the Status register, which is set to 1 on Soft Reset or an NMI; 0 for 
a Cold Reset 


¢ BEV bit of the Status register, which is set to | 
¢ TS bit of the Status register, which is set to 0 
¢ PC is set to the reset vector OxFFFF FFFF BFCO 0000 


¢ clears any pending Cache Error exceptions 


Servicing 


A Soft Reset exception is intended to quickly reinitialize a previously operating processor 
after a fatal error. 


It is not normally possible to continue program execution after returning from this 
exception, since a SysReset* signal can be accepted anytime. 
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NMI Exception 


Cause 


Processing 


Servicing 
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The NMI exception is caused by assertion of the SysNMI™ signal. 
An NMI exception is not maskable. 


In R4400 processor, there is no way for software to differentiate between a Soft Reset 
exception and an NMI exception. In the R10000 processor, a bit labelled NMI has been 
added to the Status register to distinguish between these two exceptions. Both Soft Reset 
and NMI exceptions set the SR bit and use the same exception vector. During an NMI 
exception, the NMI bit is set to 1; during a Soft Reset, the NMI bit is set to 0. 


When an NMI exception occurs, the SR bit of the Status register is set, distinguishing this 
exception from a Cold Reset exception. 


An exception caused by an NMI is taken at the instruction boundary. It does not abort any 
state machines, preserving the state of the processor for diagnosis. The Cause register 
remains unchanged and the system jumps to the NMI exception handler (see Table 16-1). 


An NMI exception preserves the contents of all registers, except for: 
¢ ErrorEPC register, which contains the PC 
¢ ERL bit of the Status register, which is set to 1 


e SR bit of the Status register, which is set to 1 on Soft Reset or an NMI; 0 for 
a Cold Reset 


¢ BEV bit of the Status register, which is set to | 
¢ TS bit of the Status register, which is set to 0 
¢ PC is set to the reset vector OxFFFF FFFF BFCO 0000 


¢ clears any pending Cache Error exceptions 


The NMI can be used for purposes other than resetting the processor while preserving cache 
and memory contents. For example, the system might use an NMI to cause an immediate, 
controlled shutdown when it detects an impending power failure. 


It is not normally possible to continue program execution after returning from this 
exception, since an NMI can occur during another error exception. 
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The Address Error exception occurs when an attempt is made to execute one of the 
following: 


reference to an illegal address space 

reference the supervisor address space from User mode 

reference the kernel address space from User or Supervisor mode 

load or store a doubleword that is not aligned on a doubleword boundary 
load, fetch, or store a word that is not aligned on a word boundary 


load or store a halfword that is not aligned on a halfword boundary 


This exception is not maskable. 


The common exception vector is used for this exception. The AdEL or AdES code in the 


Cause register is set, indicating whether the instruction caused the exception with an 


instruction reference, load operation, or store operation shown by the EPC register and BD 


bit in the Cause register. 


When this exception occurs, the BadVAddr register retains the virtual address that was not 
properly aligned or that referenced protected address space. The contents of the VPN field 
of the Context, XContext, and EntryHi registers are undefined, as are the contents of the 


EntryLo register. 


The EPC register contains the address of the instruction that caused the exception, unless 


this instruction is in a branch delay slot. If it is in a branch delay slot, the EPC register 


contains the address of the preceding branch instruction and the BD bit of the Cause register 


is set as indication. 


The process executing at the time is handed a UNIX SIGSEGV (segmentation violation) 


signal. This error is usually fatal to the process incurring the exception. 
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TLB Exceptions 


Three types of TLB exceptions can occur: 


¢ TLB Refill occurs when there is no TLB entry that matches an attempted 
reference to a mapped address space. 


¢  TLB Invalid occurs when a virtual address reference matches a TLB entry that 
is marked invalid. 


¢ TLB Modified occurs when a store operation virtual address reference to 
memory matches a TLB entry which is marked valid but is not dirty (the entry 
is not writable). 
The following three sections describe these TLB exceptions. 


NOTE: TLB Refill vector selection is also described earlier in this chapter, in the 
section titled, TLB Refill Vector Selection. 
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The TLB refill exception occurs when there is no TLB entry to match a reference to a 
mapped address space. This exception is not maskable. 


There are two special exception vectors for this exception; one for references to 32-bit 
address spaces, and one for references to 64-bit address spaces. The UX, SX, and KX bits 
of the Status register determine whether the user, supervisor or kernel address spaces 
referenced are 32-bit or 64-bit spaces; the TLB refill vector is selected based upon the 
address space of the address causing the TLB miss (user, supervisor, or kernel mode 
address space), together with the value of the corresponding extended addressing bit in the 
Status register (UX, SX, or KX). The current operating mode of the processor is not 
important except that it plays a part in specifying in which space an address resides. An 
address is in user space if it is in useg, suseg, kuseg, xuseg, xsuseg, or xkuseg (see the 
description of virtual address spaces in Chapter 15). An address is in supervisor space if it 
is in sseg, ksseg, xsseg or xksseg, and an address is in kernel space if it is in either kseg3 or 
xkseg. Kseg0, kseg1, and kernel physical spaces (xkphys) are kernel spaces but are not 
mapped. 


All references use these vectors when the EXL bit is set to 0 in the Status register. This 
exception sets the TLBL or TLBS code in the ExcCode field of the Cause register. This code 
indicates whether the instruction, as shown by the EPC register and the BD bit in the Cause 
register, caused the miss by an instruction reference, load operation, or store operation. 


When this exception occurs, the BadVAddr, Context, XContext and EntryHi registers hold 
the virtual address that failed address translation. The EntryHi register also contains the 
ASID from which the translation fault occurred. The Random register normally contains a 
valid location in which to place the replacement TLB entry. The contents of the EntryLo 
register are undefined. The EPC register contains the address of the instruction that caused 
the exception, unless this instruction is in a branch delay slot, in which case the EPC 
register contains the address of the preceding branch instruction and the BD bit of the Cause 
register is set. 


To service this exception, the contents of the Context or XContext register are used as a 
virtual address to fetch memory locations containing the physical page frame and access 
control bits for a pair of TLB entries. The two entries are placed into the EntryLo0/ 
EntryLo1 register; the EntryHi and EntryLo registers are written into the TLB. 


It is possible that the virtual address used to obtain the physical address and access control 
information is on a page that is not resident in the TLB. This condition is processed by 
allowing a TLB refill exception in the TLB refill handler. This second exception goes to 
the common exception vector because the EXL bit of the Status register is set. 


TLB Invalid Exception 
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The TLB invalid exception occurs when a virtual address reference matches a TLB entry 
that is marked invalid (TLB valid bit cleared). This exception is not maskable. 


The common exception vector is used for this exception. The TLBL or TLBS code in the 
ExcCode field of the Cause register is set. This indicates whether the instruction, as shown 
by the EPC register and BD bit in the Cause register, caused the miss by an instruction 
reference, load operation, or store operation. 


When this exception occurs, the BadVAddr, Context, XContext and EntryHi registers 
contain the virtual address that failed address translation. The EntryHi register also 
contains the ASID from which the translation fault occurred. The Random register 
normally contains a valid location in which to put the replacement TLB entry. The contents 
of the EntryLo registers are undefined. 


The EPC register contains the address of the instruction that caused the exception unless 
this instruction is in a branch delay slot, in which case the EPC register contains the address 
of the preceding branch instruction and the BD bit of the Cause register is set. 


A TLB entry is typically marked invalid when one of the following is true: 
* a virtual address does not exist 
e the virtual address exists, but is not in main memory (a page fault) 


e atrap is desired on any reference to the page (for example, to maintain a 
reference bit) 


After servicing the cause of a TLB Invalid exception, the TLB entry is located with TLBP 
(TLB Probe), and replaced by an entry with that entry’s Valid bit set. 
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TLB Modified Exception 


Cause 


Processing 


Servicing 


The TLB modified exception occurs when a store operation virtual address reference to 
memory matches a TLB entry that is marked valid but is not dirty and therefore is not 
writable. This exception is not maskable. 


The common exception vector is used for this exception, and the Mod code in the Cause 
register is set. 


When this exception occurs, the BadVAddr, Context, XContext and EntryHi registers 
contain the virtual address that failed address translation. The EntryHi register also 
contains the ASID from which the translation fault occurred. The contents of the EntryLo 
register are undefined. 


The EPC register contains the address of the instruction that caused the exception unless 
that instruction is in a branch delay slot, in which case the EPC register contains the address 
of the preceding branch instruction and the BD bit of the Cause register is set. 


The kernel uses the failed virtual address or virtual page number to identify the 
corresponding access control information. The page identified may or may not permit write 
accesses; if writes are not permitted, a write protection violation occurs. 


If write accesses are permitted, the page frame is marked dirty/writable by the kernel in its 
own data structures. The TLBP instruction places the index of the TLB entry that must be 
altered into the Index register. The EntryLo register is loaded with a word containing the 
physical page frame and access control bits (with the D bit set), and the EntryHi and 
EntryLo registers are written into the TLB. 


Cache Error Exception 
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The Cache Error exception is described in Chapter 9, Cache Error Exception. 


Virtual Coherency Exception 


Errata 


The Virtual Coherency exception is not implemented in the R10000 processor, since the 
virtual coherency condition is handled in hardware. When the hardware detects the Virtual 


Coherency exception, it invalidates the lines in all other segments of the primary cache that 


could cause aliasing. This takes six cycles more than that needed to refill the primary cache 
line (the refill would have occurred even if there was no Virtual Coherency exception 


detected). 


In the R4400 processor, a Virtual Coherency exception occurs when a primary cache miss 
hits in the secondary cache but VA[14:12] are not the same as the Pldx field of the 
secondary cache tag, and the cache algorithm specifies that the page is cached. When such 
a situation is detected in the R10000 processor, the primary cache lines at the old virtual 
index are invalidated and the Pldx field of the secondary cache is written with the new 
virtual index. 
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A Bus Error exception occurs when a processor block read, upgrade, or double/single/ 
partial-word read request receives an external ERR completion response, or a processor 
double/single/partial-word read request receives an external ACK completion response 
where the associated external double/single/partial-word data response contains an 
uncorrectable error. This exception is not maskable. 


The common interrupt vector is used for a Bus Error exception. The JBE or DBE code in 
the ExcCode field of the Cause register is set, signifying whether the instruction (as 
indicated by the EPC register and BD bit in the Cause register) caused the exception by an 
instruction reference, load operation, or store operation. 


The EPC register contains the address of the instruction that caused the exception, unless 
it is in a branch delay slot, in which case the EPC register contains the address of the 
preceding branch instruction and the BD bit of the Cause register is set. 


The physical address at which the fault occurred can be computed from information 
available in the CPO registers. 


¢ If the JBE code in the Cause register is set (indicating an instruction fetch 
reference), the instruction that caused the exception is located at the virtual 
address contained in the EPC register (or 4+ the contents of the EPC register 
if the BD bit of the Cause register is set). 


e Ifthe DBE code is set (indicating a load or store reference), the instruction 
that caused the exception is located at the virtual address contained in the 
EPC register (or 4+ the contents of the EPC register if the BD bit of the Cause 
register is set). 


The virtual address of the load and store reference can then be obtained by interpreting the 
instruction. The physical address can be obtained by using the TLBP instruction and 
reading the EntryLo registers to compute the physical page number. The process executing 
at the time of this exception is handed a UNIX SIGBUS (bus error) signal, which is usually 
fatal. 
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Integer Overflow Exception 


Cause 
An Integer Overflow exception occurs when an ADD, ADDI, SUB, DADD, DADDI or 
DSUB instruction results in a 2’s complement overflow. This exception is not maskable. 
Processing 


The common exception vector is used for this exception, and the OV code in the Cause 
register is set. 


The EPC register contains the address of the instruction that caused the exception unless 
the instruction is in a branch delay slot, in which case the EPC register contains the address 
of the preceding branch instruction and the BD bit of the Cause register is set. 


Servicing 


The process executing at the time of the exception is handed a UNIX SIGFPE/ 
FPE_INTOVF_TRAP (floating-point exception/integer overflow) signal. This error is 
usually fatal to the current process. 


335 


Trap Exception 


336 


Cause 


Processing 


Servicing 


Chapter 16 CPU Exceptions 


The Trap exception occurs when a TGE, TGEU, TLT, TLTU, TEQ, TNE, TGEI, TGEUI, 
TLTI, TLTUI, TEQI, or TNEI instruction results ina TRUE condition. This exception is not 
maskable. 


The common exception vector is used for this exception, and the Tr code in the Cause 
register is set. 


The EPC register contains the address of the instruction causing the exception unless the 
instruction is in a branch delay slot, in which case the EPC register contains the address of 
the preceding branch instruction and the BD bit of the Cause register is set. 


The process executing at the time of a Trap exception is handed a UNIX SIGFPE/ 
FPE_INTOVF_TRAP (floating-point exception/integer overflow) signal. This error is 
usually fatal. 


System Call Exception 
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A System Call exception occurs during an attempt to execute the SYSCALL instruction. 
This exception is not maskable. 


The common exception vector is used for this exception, and the Sys code in the Cause 
register is set. 


The EPC register contains the address of the SYSCALL instruction unless it is in a branch 
delay slot, in which case the EPC register contains the address of the preceding branch 
instruction. 


If the SYSCALL instruction is in a branch delay slot, the BD bit of the Status register is set; 
otherwise this bit is cleared. 


When the System Call exception occurs, control is transferred to the applicable system 
routine. Additional distinctions can be made by analyzing the Code field of the SYSCALL 
instruction (bits 25:6), and loading the contents of the instruction whose address the EPC 
register contains. 


To resume execution, the EPC register must be altered so that the SYSCALL instruction 
does not re-execute; this is accomplished by adding a value of 4 to the EPC register (EPC 
register + 4) before returning. 


If a SYSCALL instruction is in a branch delay slot, a more complicated algorithm, beyond 
the scope of this description, may be required. 
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A Breakpoint exception occurs when an attempt is made to execute the BREAK instruction. 
This exception is not maskable. 


The common exception vector is used for this exception, and the BP code in the Cause 
register is set. 


The EPC register contains the address of the BREAK instruction unless it is in a branch 
delay slot, in which case the EPC register contains the address of the preceding branch 
instruction. 


If the BREAK instruction is in a branch delay slot, the BD bit of the Status register is set, 
otherwise the bit is cleared. 


When the Breakpoint exception occurs, control is transferred to the applicable system 
routine. Additional distinctions can be made by analyzing the Code field of the BREAK 
instruction (bits 25:6), and loading the contents of the instruction whose address the EPC 
register contains. A value of 4 must be added to the contents of the EPC register (EPC 
register + 4) to locate the instruction if it resides in a branch delay slot. 


To resume execution, the EPC register must be altered so that the BREAK instruction does 
not re-execute; this is accomplished by adding a value of 4 to the EPC register (EPC register 
+4) before returning. 


If a BREAK instruction is in a branch delay slot, interpretation of the branch instruction is 
required to resume execution. 
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Reserved Instruction Exception 


Cause 


Processing 


Servicing 


The Reserved Instruction exception occurs when one of the following conditions occurs: 


* an attempt is made to execute an instruction with an undefined major opcode 
(bits 31:26) 


* an attempt is made to execute a SPECIAL instruction with an undefined minor 
opcode (bits 5:0) 


* an attempt is made to execute a REGIMM instruction with an undefined minor 
opcode (bits 20:16) 


* an attempt is made to execute 64-bit operations in 32-bit mode when in User 
or Supervisor modes 


* an attempt is made to execute a COP1X when the MIPS IV ISA is not enabled 


64-bit operations are always valid in Kernel mode regardless of the value of the KX bit in 
the Status register. 


This exception is not maskable. 


The common exception vector is used for this exception, and the RJ code in the Cause 
register is set. 


The EPC register contains the address of the reserved instruction unless it is in a branch 
delay slot, in which case the EPC register contains the address of the preceding branch 
instruction. 


No instructions in the MIPS ISA are currently interpreted. The process executing at the 
time of this exception is handed a UNIX SIGILL/ILL_RESOP_FAULT (illegal instruction/ 
reserved operand fault) signal. This error is usually fatal. 
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Coprocessor Unusable Exception 
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The Coprocessor Unusable exception occurs when an attempt is made to execute a 
coprocessor instruction for either: 


* —acorresponding coprocessor unit (CP1 or CP2) that has not been marked 
usable, or 


¢ CPO instructions, when the unit has not been marked usable and the process 
executes in either User or Supervisor mode. 


This exception is not maskable. 


The common exception vector is used for this exception, and the CpU code in the Cause 
register is set. The contents of the Coprocessor Usage Error field of the coprocessor 
Control register indicate which of the four coprocessors was referenced. The EPC register 
contains the address of the unusable coprocessor instruction unless it is in a branch delay 
slot, in which case the EPC register contains the address of the preceding branch 
instruction. 


The coprocessor unit to which an attempted reference was made is identified by the 
Coprocessor Usage Error field, which results in one of the following situations: 


e If the process is entitled access to the coprocessor, the coprocessor is marked 
usable and the corresponding user state is restored to the coprocessor. 


¢ If the process is entitled access to the coprocessor, but the coprocessor does 
not exist or has failed, interpretation of the coprocessor instruction is possible. 


¢ If the BD bit is set in the Cause register, the branch instruction must be 
interpreted; then the coprocessor instruction can be emulated and execution 
resumed with the EPC register advanced past the coprocessor instruction. 


e If the process is not entitled access to the coprocessor, the process executing 
at the time is handed a UNIX SIGILL/ILL_PRIVIN_FAULT (illegal 
instruction/privileged instruction fault) signal. This error is usually fatal. 
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Floating-Point Exception 


Cause 
The Floating-Point exception is used by the floating-point coprocessor. This exception is 
not maskable. 

Processing 
The common exception vector is used for this exception, and the FPE code in the Cause 
register is set. 
The contents of the Floating-Point Control/Status register indicate the cause of this 
exception. 

Servicing 


This exception is cleared by clearing the appropriate bit in the Floating-Point Control/ 
Status register. 
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A Watch exception occurs when a load or store instruction references the physical address 
specified in the WatchLo/WatchHi System Control Coprocessor (CPO) registers. The 
WatchLo register specifies whether a load or store initiated this exception. 


A Watch exception violates the rules of a precise exception in the following way: If the load 
or store reference which triggered the Watch exception has a cacheable address and misses 
in the data cache, the line will then be read from memory into the secondary cache if 
necessary, and refilled from the secondary cache into the data cache. In all other cases, 
cache state is not affected by an instruction which takes a Watch exception. 


The CACHE instruction never causes a Watch exception. 


The Watch exception is postponed if either the EXL or ERL bit is set in the Status register. 
If either bit is set, the instruction referencing the WatchLo/WatchHi address is executed and 
the exception is delayed until the delay condition is cleared; that is, until ERL and EXL both 
are cleared (set to 0). The EPC contains the address of the next unexecuted instruction. 


A delayed Watch exception is cleared by system reset or by writing a value to the WatchLo 
ister. 
register. 


Watch is maskable by setting the EX or ERL bits in the Status register. 


The common exception vector is used for this exception, and the Watch code in the Cause 
register is set. 


The Watch exception is a debugging aid; typically the exception handler transfers control 
to a debugger, allowing the user to examine the situation. 


To continue program execution, the Watch exception must be disabled to execute the 
faulting instruction. The Watch exception must then be reenabled. The faulting instruction 
can be executed either by interpretation or by setting breakpoints. 


+ An MTCO to the WatchLo register clears a delayed Watch exception. 
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The Interrupt exception occurs when one of the eight interrupt conditions is asserted. The 
significance of these interrupts is dependent upon the specific system implementation. 


Each of the eight interrupts can be masked by clearing the corresponding bit in the 
Interrupt-Mask (IM) field of the Status register, and all of the eight interrupts can be masked 
at once by clearing the JE bit of the Status register. 


The common exception vector is used for this exception, and the Jnt code in the Cause 
register is set. 


The /P field of the Cause register indicates current interrupt requests. It is possible that 
more than one of the bits can be simultaneously set (or even no bits may be set) if the 
interrupt is asserted and then deasserted before this register is read. 


On Cold Reset, an R4400 processor can be configured with /P[7] either as a sixth external 
interrupt, or as an internal interrupt set when the Count register equals the Compare register. 
There is no such option on the R10000 processor; JP[7] is always an internal interrupt that 
is set when one of the following occurs: 


¢ the Count register is equal to the Compare register 


e either one of the two performance counters overflows 


Software needs to poll each source to determine the cause of the interrupt (which could 
come from more than one source at a time). For instance, writing a value to the Compare 
register clears the timer interrupt but it may not clear /P/7] if one of the performance 
counters is simultaneously overflowing. Performance counter interrupts can be disabled 
individually without affecting the timer interrupt, but there is no way to disable the timer 
interrupt without disabling the performance counter interrupt. 


If the interrupt is caused by one of the two software-generated exceptions (described in 
Chapter 6, the section titled “Software Interrupts’’), the interrupt condition is cleared by 
setting the corresponding Cause register bit, JP[1:0], to 0. Software interrupts are 
imprecise. Once the software interrupt is enabled, program execution may continue for 
several instructions before the exception is taken. Timer interrupts are cleared by writing 
to the Compare register. The Performance Counter interrupt is cleared by writing a 0 to bit 
31, the overflow bit, of the counter. 


Cold Reset and Soft Reset exceptions clear all the outstanding external interrupt requests, 
IP[2] to IP[6]. 


If the interrupt is hardware-generated, the interrupt condition is cleared by correcting the 
condition causing the interrupt pin to be asserted. 
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16.4 MIPSIV Instructions 
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The system must either be in Kernel or Supervisor mode, or have set the XX bit of the Status 
register to a | in order to use the MIPS IV instruction set. In User mode, if XX is a 0 and 
an attempt is made to execute MIPS IV instructions, an exception will be taken. The type 
of exception that will be taken depends upon the type of instruction whose execution was 
attempted; a list is given in Table 16-4. Note that operating with MIPS I'V instructions does 
not require that MIPS III instruction set or 64-bit addressing is enabled. 


MIPS IV instructions that use or modify the floating-point registers (CP1 state) are also 
affected by the CU bit of the CPO Status register. If CU/ is not set, a Coprocessor 
Unusable exception may be signaled. 


The Reserved Instruction (RI), Coprocessor Unusable (CU), and Unimplemented 
Operation (UO) exceptions for MIPS IV instructions are listed in the Table 16-4 below. 


Table 16-4 MIPS IV Instruction Exceptions 


Exceptions Instructions CU1 | MIPS4 


RI CPU (undefined) - - 
RI MOVN,Z E 
RI 
MOVT,F 
CU 
RI PREF - 


COP! (all instructions) 
UO (undefined) 7 
RI BC (cc>0) 
C (cc>0) 
MOVN,Z,T,F 
UO RECIP, RSQRT 
COP1X (all instructions) 
CU (all instructions) 0 
(undefined) 


16.5 COP0 Instructions 


16.6 COP1 Instructions 
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Execution of an RFE instruction causes a Reserved Instruction exception in the R10000 
processor. 


The execution of undefined COPO functions is undefined in the R10000 processor. 


The R10000 and R4400 processors do not generate the same exceptions for undefined 
COP1 instructions. In the R4400 processor, undefined opcodes or formats in the sub field 
take an Unimplemented Operation exceptions. In the R10000 processor, undefined 
opcodes (bits 25:24 are 0 or 1) take Reserved Instruction exceptions and undefined formats 
(bits 25:24 are 2 or 3) take Unimplemented Operation exceptions. 


In MIPS II on an R4400 processor, the execution of DMTC1, DMFC1, and L format take 
Unimplemented Operation exceptions. In MIPS II on the R10000 processor, the execution 
of DMTC1 and DMFC1 take Reserved Instruction exceptions 


The attempted execution of the L format takes an Unimplemented Operation exception 
when the MIPS III mode is not enabled. 


A CTC instruction that sets both Cause and Enable bits also forces an immediate floating- 
point exception; the EPC register points to the offending CTC1 instruction. 


16.7 COP2 Instructions 


If the CU2 bit of the CPO Status register is not set during an attempted execution of such 
Coprocessor 2 instructions as COP2, LWC2, SWC2, LDC2, and SDC2, the system takes a 
Coprocessor Unusable exception. 


In the R4400 processor, if the CU2 bit is set, COP2 instructions are handled as NOPs; the 
operations of Coprocessor 2 load/store instructions are undefined. In the R10000 
processor, an execution of a Coprocessor 2 instruction takes a Reserved Instruction 
exception when CU2 bit is set. 
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17. Cache Test Mode 


The R10000 processor provides a cache test mode that may be used during manufacturing 
test and system debug to access the following internal RAM arrays: 


e data cache data array 

e data cache tag array 

* instruction cache data array 
* instruction cache tag array 


* secondary cache way predication table 
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Cache test mode is accessed by using a subset of the system interface signals. By not 
requiring the use of any secondary cache interface signals, the internal RAM arrays may be 
accessed for single-chip LGA as well as R10000/secondary cache module configurations. 


The following system interface signals are used during cache test mode: 
¢  SysAD(57:0) 
e« = SysVal* 


Any input signals not listed above are ignored by the processor when it is operating in cache 
test mode, and any output signals not listed above are undefined during cache test mode. 


17.2 System Interface Clock Divisor 
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Cache test mode is supported for all system interface clock speeds. However, since cache 
test mode repeat rates and latencies are expressed in terms of PCIk cycles, the external 
agent must take care when operating at any system interface clock divisor other than 
Divide-by-1. 


17.3 Entering Cache Test Mode 


In order for the processor to enter cache test mode, the external agent must begin a Power- 
on or Cold Reset sequence. 


Cycle 
SysClk 
Master 
SysReset* 
SysGnt* 
SysAD(63:0) 
SysVal* 
SysRespVal* 
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Rather than negating SysReset* at the end of the reset sequence, the external agent loads 
the mode bits into the processor by driving the mode bits (with the CTM signal asserted) 
on SysAD(63:0), waits at least two SysClk cycles, and then asserts SysGnt* for at least 

one SysClk cycle. 


After waiting at least another 100 ms, the external agent may issue the first cache test mode 


command. 


Figure 17-1 shows the cache test mode entry sequence. 
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Assert CTM mode bit 
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First cache test mode command 


Figure 17-1 Cache Test Mode Entry Sequence 
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17.4 Exit Sequence 
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Cycle 
SysClk 
Master 
SysReset* 
SysGnt* 
SysAD(63:0) 
SysVal* 
SysRespVal* 


To leave cache test mode, the external agent does the following: 


e loads the mode bits into the processor by driving the mode bits (with the 
CTM mode bit negated) on SysAD(63:0) 


* waits at least two SysClk cycles 


* asserts SysGnt* for at least one SysClk cycle 


After at least one SysClk cycle, the external agent may negate SysReset* to end the reset 
sequence. 


Figure 17-2 shows the cache test mode exit sequence. 


‘EA: EA: EA: EA: EA: 


Negate CTM mode bit g 
Figure 17-2. Cache Test Mode Exit Sequence 
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17.5 SysAD(63:0) Encoding 


Encoding of the SysAD(63:0) bus during cache test mode is shown in Table 17-1. 
“Unused” fields are read as “undefined,” and must be written as zeroes. 


Table 17-1 Cache Test Mode SysAD(63:0) Encoding 


Data Data Instruction Instruction enced 
SysAD Bit Cache Data Cache Tag Cache Data Cache Tag reas y 
Arra Arra APea Predication 
f i Array 
Tag parity Tag parity MRU 


SCWay Unused 


Unused Unused 


Unused 


Tag Tag 
Data parity 
Data parity 
StateMod 
Unused Unused 
Unused 
Unused 
3 


Array select 


Write/Read select 


Auto-increment select 


Way 


Address 


Unused 
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17.6 Cache Test Mode Protocol 


This section describes the cache test mode protocol in detail, including: 
* normal write protocol 
* auto-increment protocol 
¢ normal read protocol 


* auto-increment read protocol 


Normal Write Protocol 


A cache test mode normal write operation writes a selected RAM array. The write address, 
way, array, and data are specified in the write command. 


The external agent issues a normal write command by: 
¢ driving the address on SysAD(57:46) 
* driving the way on SysAD(45) 
* negating the auto-increment select on SysAD(44) 
* asserting the Write/Read select on SysAD(43) 
e driving the array select on SysAD(42:40) 
e driving the write data on SysAD(39:0) 
* asserting SysVal* for one SysClk cycle 
Normal writes have a repeat rate of 8 PCIk cycles. 


Figure 17-3 depicts two cache test mode normal writes. 


Cycle 45253 °4'°5'6'7'8'9 10°11 '12'°139'°14'15' 16517 
SysClk (\S\f 


Master /EA EA: EA: EA: EA: EA: EA: EA’ EA’ EA: EA: EA: EA: EA: EA: EA: EA: 
SysReset* ' ' ' ! 

SysGnt* 
SysAD(63:0) [ty tt gy 
sysvar ee a a ae 


Figure 17-3 Cache Test Mode Normal Write Protocol 
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Auto-Increment Write Protocol 


Cycle 
SysClk 
Master 
SysReset* 
SysGnt* 
SysAD(63:0) 
SysVal* 
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A cache test mode auto-increment write operation writes a selected RAM array. The write 


address is obtained by incrementing the previous write address, and the write way is 


obtained from the previous write way. 


If an overflow occurs when incrementing the previous write address, the address wraps to 
0, and the way is toggled. 


The write data is identical to the previous write data. 


For proper results, an auto-increment write must always be proceeded by a normal or auto- 
increment write. 


The external agent issues an auto-increment write command by: 


asserting the auto-increment select on SysAD(44) 
asserting the Write/Read select on SysAD(43) 
driving the array select on SysAD(42:40) 


asserting SysVal* for one SysClk cycle 


Auto-increment writes have a repeat rate of one PCIk cycle. 


Figure 17-4 depicts three cache test mode auto-increment writes. 


45253 54'°5'6'7'8'9 10° 12'°139'°14'15' 16° 17° 
PN fl Me Nets CU Ee ARI Na a RO Na 
EA EA EA EA EA EA EA EA EA EA EA EA EA EA EA EA EA 
: : XincWrXincWrXincWrx : d : : ' d : ' ; j : : 


Figure 17-4 Cache Test Mode Auto-Increment Write Protocol 
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Normal Read Protocol 


A cache test mode normal read operation reads a selected RAM array. The read address, 
way, and array are specified by the read command. 


The external agent issues a normal read command by: 


driving the address on SysAD(57:46) 

driving the way on SysAD(45) 

negating the auto-increment select on SysAD(44) 
negating the Write/Read select on SysAD(43) 
driving the array select on SysAD(42:40) 
asserting SysVal* for one SysClk cycle. 


After a read latency of 15 PClIk cycles, the processor provides the read response by: 


entering Master state 


¢ driving the read data on SysAD(39:0) 
* asserting SysVal* for one SysClk cycle. 


In the following SysClk cycle, the processor reverts to Slave state. 
Normal reads have a repeat rate of 17 PClk cycles. 


Figure 17-5 depicts two cache test mode normal reads. 


Cycle 1 
SysClk / / J J 


Master EA) EA! EA: aes 
SysReset* 
SysGnt* 

SysAD(63:0) 


SysVal* ; ; 


Figure 17-5 Cache Test Mode Normal Read Protocol 
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Auto-Increment Read Protocol 


Cycle 
SysClk 
Master 


SysReset* 


A cache test mode auto-increment read operation reads a selected RAM array. The read 


address is obtained by incrementing the previous access address, and the read way is 


obtained from the previous access way. 


If an overflow occurs when incrementing the previous access address, the address wraps to 


0, and the way is toggled. 


The external agent issues an auto-increment read command by: 


* asserting the auto-increment select on SysAD(44) 
* negating the Write/Read select on SysAD(43) 

* driving the array select on SysAD(42:40) 

* — asserting SysVal* for one SysClk cycle. 


After a read latency of 15 PCIk cycles, the processor provides the read response by: 


e entering Master state 
¢ driving the read data on SysAD(39:0) 
* asserting SysVal* for one SysClk cycle. 


In the following SysClk cycle, the processor reverts to Slave state. 


Auto-increment reads have a repeat rate of 17 PCIk cycles. 


Figure 17-6 depicts two cache test mode auto-increment reads. 


SysGnt* 
SysAD(63:0) 
SysVal* 
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Figure 17-6 Cache Test Mode Auto-Increment Read Protocol 
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Appendix A Glossary 


The following terms are defined in this Glossary: 


superscalar processor 

pipeline 

pipeline latency 

pipeline repeat rate 

out-of-order execution 

dynamic scheduling 

instruction fetch, decode, issue, execution, completion, and graduation 
active list 

free list and busy registers 
register renaming and unnaming 
nonblocking loads and stores 
speculative branching 

logical and physical registers 
register files 


ANDES architecture 
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A.1 Superscalar Processor 


A.2 Pipeline 


A.3 Pipeline Latency 


A superscalar processor is one that can fetch, execute and complete more than one 
instruction in parallel. By implication, a superscalar processor has more than one pipeline 
(see below). 


In the processor pipeline, the execution of each instruction is divided into a sequence of 
simpler suboperations. Each suboperation is performed by a separate hardware section 
called a stage, and each stage passes its result to a succeeding stage. 


Normally, each instruction only remains in each stage for a single cycle, and each stage 
begins executing a new instruction as previous instructions are being completed in later 
stages. Thus, a new instruction can often begin during every cycle. 


Pipelines greatly improve the rate at which instructions can be executed, as long as there 
are no dependencies. The efficient use of a pipeline requires that several instructions be 
executed in parallel, however the result of any instruction is not available for several cycles 
after that instruction has entered the pipeline. Thus, new instructions must not depend on 
the results of instructions which are still in the pipeline. 


The latency of an execution pipeline is the number of cycles between the time an 
instruction is issued and the time a dependent instruction (which uses its result as an 
operand) can be issued. 


In the R10000 processor, most integer instructions have a single-cycle latency, load 
instructions have a 2-cycle latency for cache hits, and floating-point addition and 
multiplication have a 2-cycle latency. Integer multiply, floating-point square-root, and all 
divide instructions are computed iteratively and have longer latencies. 


A.4 Pipeline Repeat Rate 


The repeat rate of the pipeline is the number of cycles that occur between the issuance of 
one instruction and the issuance of the next instruction to the same execution unit. In the 
R10000 processor, the main five pipelines all have repeat rates of one cycle, but the iterative 
units have longer repeat delays. 


A.5 Out-of-Order Execution 
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The “program order” of instructions is the sequence in which they are fetched and decoded. 
In the R10000 processor, instructions may be issued, executed, and completed out of 
program order. They are always graduated in program order. 
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A.6 Dynamic Scheduling 


The R10000 processor can issue instructions to functional units out of program order; this 
capability is known as dynamic scheduling or dynamic issuing. 


The R10000 processor can dynamically issue an instruction as soon as all its operands are 
available and the required execution unit is not busy. Thus, an instruction is not delayed by 
a Stalled previous instruction unless it needs the results of that previous instruction. 


A.7 Instruction Fetch, Decode, Issue, Execution, Completion, and Graduation 


A.8 Active List 


In general, instructions are fetched, decoded, and graduated in their original program order, 
but may be issued, executed, and completed out of program order, as shown in Figure A-1. 


¢ Instruction fetching is the process of reading instructions from the instruction 
cache. 


¢ Instruction decode includes register renaming and initial dependency checks. 
For branch instructions, the branch path is predicted and the target address is 
computed. 


¢ An instruction is issued when it is handed over to a functional unit for 
execution. 


e An instruction is complete when its result has been computed and stored in a 
temporary physical register. 


e An instruction graduates when this temporary result is committed as the new 
state of the processor. An instruction can graduate only after it and all 
previous instructions have been successfully completed. 


In order In order 
A en! 
— _ Out of order 
Instruction Fetch Decode |/— —~,| Graduate 
Issue Execute Complete 
Time = 


Figure A-I Dynamic Scheduling 


The R10000 processor’s active list is a program-order list of decoded instructions. For 
each instruction, the active list indicates the physical register which contained the previous 
value of the destination register (if any). If this instruction graduates, that previous value 
is discarded and the physical register is returned to the free list. The active list records 
status, such as those instructions that have completed, or those instructions that have 
detected exceptions. Instructions are appended to the bottom of the list as they are decoded 
and instructions are removed from the top as they graduate. 
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<R12000> 

Active List entries are increased to 48: 

The active list has been enlarged so that it now contains 48 entries. 
Active list accepts conservatively 


The read pointer for the active list is now evaluated on four-instruction blocks at a time. 
This has two effects: 


a) There may be up to 11 empty slots in the active list and yet it will report to the decode 
unit that it cannot accept any new instructions. However this level of blockage only lasts 
for a single cycle. At most three empty slots will remain empty for more than one cycle. 
The time at which instructions are removed from the active list has also been changed. 
Integer and load/store instructions now remain in the list for one cycle after they gradu- 
ate. This will be compensated for by the increased size of the active list. 


b) The graduation of some instructions will be delayed, as the read pointer will not ad- 
vance past the end of a four-instruction block during a cycle. Thus less than the maxi- 
mum number of instructions might be graduated because the read pointer can get to them 
that cycle. 


A.9 Free List and Busy Registers 


A busy-bit table indicates whether or not a result has been written into each of the physical 
registers. Each register is initially defined to be busy when it is moved from the free list to 
the active list; the register becomes available (“not busy”) when its instruction completes 
and its result is stored in the register file. 


The busy-bit table is read for each operand while an instruction is decoded, and these bits 
are written into the queue with the instruction. If an operand is busy, the instruction must 
wait in the queue until the operand is “not busy.” The queues determine when an operand 
is ready by comparing the register number of the result coming out of each execution unit 
with the register number of each operand of the instructions waiting in the queue. 


With a few exceptions, the integer and address queues have integer operand registers, and 
the floating-point queue has floating-point operand registers. 


A.10 Register Renaming 
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As it executes instructions, the processor generates a myriad of temporary register results. 
These temporary values are stored in register files together with permanent values. The 
temporary values become new permanent values when their corresponding instructions 
graduate. 


Register renaming is used to resolve data dependencies during the dynamic execution of 
instructions. 


To ensure each instruction is given correct operand values, the logical register numbers 
(names) used in the instruction are mapped to physical registers. Each time a new value is 
put in a logical register, it is assigned to a new physical register. Thus, each physical 
register has only a single value. Dependencies are determined using these physical register 
numbers. 
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An example of register renaming is shown below. The following Doubleword Shift Left 
Logical instruction, 


opcode rs rt dest sa function 
spec - r2 3 2 DSLL 
DSLL 637,222 


has one register operand (r2) plus a 5-bit shift count of value two stored in the sa field; the 
value in r2 is shifted left by two and this value is stored in r3. 


The physical execution of the instruction above, with register renaming, is given below: 
Physical execution Rename operation 
p3<-p2 shift left 2 r3 = p3 


When the DSLL instruction is executed, the /ogical destination register r3 is assigned anew 
physical register, p3, from the free list. 


Register renaming also allows exceptions to be handled in a precise manner. Out-of-order 
execution means that an instruction can change its result register even before all prior 
instructions have been completed. However, if any of the prior instructions cause an 
exception, the original register value must be restored. Since each new register value is 
loaded into a new physical register (physical register values are not overwritten until the 
physical register is placed in the free list), previous values remain unchanged in the original 
physical registers and these previous values can be restored." 


An instruction can be aborted up until the time it graduates, and all register and memory 
values can be restored to a precise state following any exception. This state is restored by 
unnaming the temporary physical registers assigned to subsequent instructions. 


Registers are unnamed by writing the old destination register into the mapping table and 
returning the new destination register to the free list. Unnaming is done in reverse program 
order, in case a logical register was used more than once. After renaming, the register files 
contain only the permanent values which were created by instructions prior to the 
exception. 


Once an instruction has graduated, all previous values are lost. 


A.11 Nonblocking Loads and Stores 


Loads and stores are nonblocking; that is, cache misses do not stall the processor. All other 
parts of the processor may continue to work on non-dependent instructions while as many 
as four cache misses are being processed. 


+ This same technique is used to reverse mispredicted speculative branches. 
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A.12 Speculative Branching 


Normally, about one of every six instructions is a branch. Since four instructions are 
fetched each cycle, the R10000 processor encounters, on average, a branch instruction 
every other cycle, as shown in Figure A-2. 


Cycle 1 Cycle 0 
= 
I5 in| 
i 7 On average, one of out 
= \ every six instructions 
13 is a Branch 
4 |] 
J 
I7 
18 


Figure A-2 Speculative Branching 


When a branch instruction was encountered in previous processors, the instruction fetch 
and instruction issue halted until it was determined whether or not to take the branch. For 
instance, a branch delay slot was designed into the MIPS architecture to handle the intrinsic 
delay of a branch and to keep the pipeline filled. 


Since the processor fetches up to four instructions each clock cycle, there is not enough 
time to resolve branches without stalling the fetch/decode circuitry. The processor 
therefore predicts the outcome of every branch and speculatively executes the branch 
based on this branch prediction. 


The branch prediction circuit consists of a 512-entry RAM, using a 2-bit prediction 
scheme: two bits are assigned to a branch instruction, and indicate whether or not the 
branch was taken the last time it occurred. The four possible prediction states are: strongly 
taken, weakly taken, weakly not taken, strongly not taken. If the branch was taken the last 
two times, there is a good probability it will be taken this time too — or the inverse.’ 


The R10000 processor can speculate up to four branches deep. Shadow copies of the 
mapping tables are kept every time a prediction is made, allowing the R10000 processor to 
recover from a mispredicted branch in a single cycle. 


+ Simulations have shown the R10000 branch prediction algorithm to be over 90% accurate. 
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<R12000> 
Use of global history in branch-prediction: 


The history register is 8 bits wide, and implements the ‘gshare’ predictor (reference to 
paper that defines will be provided later). The history register is updated speculatively, 
with a one cycle delay after a prediction before the results are available for use in forming 
another prediction index. As mentioned earlier, some programs with small “working set 
of conditional branches” benefit significantly from the use of such hashing; however, a 
slightly variable number of previously-executed branches may be omitted from the 
predictions made for any given branch. This will reduce prediction accuracy somewhat. 
Global history register is enabled via bits 26:23 of the Diag Register (CPO register 22). If 
bit 26 is set, branch prediction uses all eight bits of the global history register. If bit 26 is 
not set, then bits 25:23 specify a count of the number of bits of global history register to 
be used. 


Increase in branch prediction table size: 


The table size is increased to 2048 2-bit entries. 


A.13 Logical and Physical Registers 


A.14 Register Files 


Register renaming (described above) distinguishes between logical registers, which are 
referenced within instruction fields, and physical registers, which are actually located in 
the hardware register file. The programmer is only aware of logical registers; the 
implementation of physical registers is entirely transparent. 


Logical register numbers are dynamically mapped onto physical register numbers. This 
mapping uses mapping tables which are updated after each instruction is decoded; each 
new result is written into a new physical register. This value is temporary and the previous 
contents of each logical register can be restored if its instruction must be aborted following 
an exception or a mispredicted branch. 


Register renaming simplifies dependency checks. Logical register numbers can be 
ambiguous when instructions are executed out of order, since a succession of different 
values may be assigned to the same register. But physical register numbers uniquely 
identify each result, making dependency checking unambiguous. 


The queues and execution units use physical register numbers. Integer and floating-point 
registers are implemented with separate renaming hardware and multi-port register files. 


The R10000 processor has two 64-bit-wide register files to store integer and floating-point 
values. Each file contains 64 registers. The integer register file has seven read and three 
write ports; the floating-point register file has five read and three write ports. 


The integer and floating-point pipelines each use two dedicated operand ports and one 
dedicated result port in the appropriate register file. The Load/Store unit uses two dedicated 
integer operand ports for address calculation. It must also load or store either integer or 
floating-point values, sharing a result port and a read port in both register files. 
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These shared ports are also used to move data between the integer and floating-point 
register files, to store branch and link return addresses, and to read the target address for 
branch register instructions. 


A.15 ANDES Architecture 


The R10000 processor uses the MIPS ANDES architecture, or Architecture with Non- 
sequential Dynamic Execution Scheduling. 


364 


Appendix B Differences between RIOOO00 and 
R12000 


The following terms are defined in this Appendix: 


Mode bits changed in R12000 

DSD (Delay Speculative Dirty) 

Config Register [22] 

Config Register [23] 

Changes in the Branch Diag Register 

Eliminate traps for Denorm/NaN FP inputs 

Increase in pre-decode buffering 

Increased penalty for indirect branches 

Addition of a Branch Target Address Cache 

Use of global history in branch-prediction 

Increase in branch prediction table size 

Address calculation for load/store instructions uses integer queue 
Load/store dependency is speculatively ignored 

DCache set locking relaxed 

SC refill blocking reduced 

Increased the Way Prediction Table (MRU table) to 16K single-bit entries 
Additional cycles for System Interface transactions 

FP and Integer-Queue Issue Policy 

Active List entires are increased to 48 

Cache Error inhibits graduation 

Changed Spare (1, 3) pins to NC (No Connection) 

CacheOp Index Write Back Invalidate (D) also clears Primary Tag 


Summary of the differences 
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B.1 Mode bits changed in R12000 
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Table B-1 Mode bits 12:9 SysClkDiv 


CODE Divider SysClk (for PClk = 300 MHz) 
0000 - Reserved 
0001 NOT AVAILABLE, defaults to 150 MHz (was 300 MHz) 
0010 NOT AVAILABLE, defaults to 150 MHz (was 200 MHz) 
0011 150 MHz 
0100 120 MHz 
0101 100 MHz 
0110 85.70 MHz 
Ora 75.00 MHz 
1000 66.00 MHz 
1001 60.00 MHz 
1010 54.55 MHz 
1011 50.00 MHz 
1100 42.85 MHz 
1101 37.50 MHz 
1110 33.33 MHz 
1111 30.00 MHz 
Table B-2. Mode bits 21:19 SCCIkDiv 
CODE Divider (PCLK) SCCIk 
000 - Reserved 
001 1 NOT AVAILABLE, defaults to 200MHz 
010 1.5 200MHz 
011 2 150 MHz 
100 2.5 120 MHz 
101 3 100 MHz 
110 - Reserved 
Sl 4 75 MHz (added for testing silicon) 
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Table B-3| Mode bits 24:22 


CODE Name Comments 


000 Reserved 


001 Reserved 


010 Reserved 


011 Reserved 


100 Delay Speculative Dirty - fix for speculative store (see B.2) 


101 Reserved 


110 Reserved 


111 Reserved 


B.2 DSD (Delay Speculative Dirty) 


The Boot Mode bit 24 corresponds to the Config register[24] bit and this controls DSD 
during kernel and supervisor modes. However, the DSD mode can also be enabled in the 
user mode by setting the Status register[24] bit. Config register[24] is read-only and can be 
set only at boot time. 


If the DSD mode is set - 


a) R12000 will not set the Dirty bit for a secondary cache block until the store instruction 
is the oldest in the Active List and is about to be executed. (An interrupt could cause a 
case where the dirty bit is set (store is no longer speculative), but the store does not 
immediately graduate. We believe this case should not cause any problem. This mode 
does prevent speculative stores from setting the dirty bit.) 


b) This mode will have slightly lower performance due to the delay in the setting of the 
Dirty bit. This delay will occur just once per block refill from main memory, when it 
is necessary to set the dirty bit. Setting the bit requires about ten cycles; but usually the 
processor will continue to overlap execution of other instructions. Once a block 
becomes dirty in secondary cache, this mode has no performance effect. 


c) In this mode, a miss in secondary cache, due to a store instruction which is not already 
the oldest in the pipeline, will cause a refill to the “clean exclusive” state. A hit to a 
shared line will immediately cause an upgrade to “clean exclusive”. Thus, bus 
operations (which are relatively slow) will still begin speculatively. 


Independent of the DSD mode, R12000 will delay a “cached, non-coherent” load until 
it is the oldest instruction. This change is implemented because a speculative load 
accessing an unmapped “xkphys” address as “cached, non-coherent” might bring data 
into the secondary cache without the proper coherency checks. 


R12000 is doing no changes to prevent it from speculatively refilling cache lines in 
shared or clean states except the “xkphys” case described above. 
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B.3 Config Register[22] 


B.4 Config Register[23] 
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Bit 22 of the Config register is ‘SC Data and Tag Corrector disable’. This bit turns off use 
of ECC to correct errors in the SC data and tags. 


When Bit [23] of the Config register is set, the response that R12000 produces to an 
external intervention (shared or exclusive) which hits on a CleanExclusive line is changed. 
As before, the state of the line in the cache is changed, and the former state of the line is 
sent out on SysState[ 1:0]. Moreover, when Bit{23] of Config is set, a processor coherency 
data response is sent with the state response. In other words, when this bit is set, external 
interventions which hit CleanExclusive or DirtyExclusive lines in the Secondary Cache 
result in a processor coherency data response. 


If Bit{23] is not set, then a data response is generated only for external interventions that 
hit a DirtyExclusive line (same behavior as that of R10000). 


B.5 Changes in the Branch Diag Register 
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In R12000 two fields are added to the “Diag Register” - CPO Register 22. One field is 
“ghistory enable’, bits 26:23. The other is “BTAC disable’, bit 27. 


The definitions are: 


¢ Ghistory enable: 
¢ If bit 26 is set, branch prediction uses all eight bits of the global history register. 
° If bit 26 is not set, then bits 25:23 specify a count of the number of bits of 
global history to be used. 
Thus if bits 26:23 are all zero, global history is disabled. 


The global history contains a record of the taken/not-taken status of recently executed 
branches, and when used is XOR’ed with the PC of a branch being predicted to produce a 
hashed value for indexing the BPT. Some programs with small “working set of conditional 
branches” benefit significantly from the use of such hashing, some see slight performance 
degradation. 


¢ BTAC disable: 


If bit 27 is set, the use of the Branch Target Address Cache (BTAC) is disabled. The BTAC 
is used to reduce the instruction fetch penalty of taken branches by providing the target 
address of fixed-address branch and jump instructions. 


Appendix B Differences between R10000 and R12000 


B.6 Eliminate traps for Denorm/NaN FP inputs 


The R10000 currently takes Unimplemented Exception when an FPU gets a NaN or 
Denorm as an input. R12000 suppresses these traps whenever the FS bit is set in the FCSR. 
R12000 simply passes through NaN’s and Denorm’s when the bit is set. This change in no 
way affects the handling of QNaNs and Denorms when they are produced, it only changes 
the way they are handled when they are received as input operands. 


Case of Denorm when the FS bit is set to 1: A Denorm received as an input to the FP unit 
is flushed to zero before the FP unit begins to process the operand. The behavior of the unit 
(when FS is 1) will be exactly that seen when the input is zero. Specifically, if the zero input 
would itself cause a trap (due to divide by zero, for example) then the that zero-generated 
trap will be taken. When a Denorm is seen at the input, the Inexact bit is set, except in the 
cases described below: 


The Inexact bit will not be set, even if FS=1 and a Denorm is seen on input, if the other 
input to the FP operation is a value which pre-determines the FP result (e.g. QNaN). 
When the result is not affected by the presence or absence of the Denorm input, the 
result is EXACT. Hence the Inexact bit should not be set, even if Flush to Zero mode 
is ON. 


Case of QNaNs when the FS Dit is set to 1: A QNaN received as an input operand for an FP 
unit will cause the unit to produce the standard QNaN (which is not necessarily same as the 
input QNaN). Note that FP units will not propagate the QNaN to the output, but will always 
produce the same, standard, QNaN. 


When the FS bit is set to zero, the behavior will be exactly as in R10000. 


When Denorms or QNaNs are produced by an FP operation, the behavior will be exactly 
as in R10000, regardless of the F'S bit setting. 


Handling of signalling NaNs will be unaffected by this change. Only the handling of input 
quiet NaNs and Denorms will be affected. 


Arithmetic instructions (like add/sub/madd/cvt/div/sqrt/recip/rsqrt) will follow the above 
behavior in all respects. 


There are some instructions that deserve special mention: 


¢ Mov, conditional mov will not be affected by this mode, i.e. no exceptions based 
on QNaNs and Denorms. Denorms and QNaNs will be moved without 
generating an exception, regardless of the FS state. This behavior is unchanged 
from that of R10000. 


¢ When FS=1, the Abs, Neg and Compare instructions will flush Denorm inputs to 
zero just as the arithmetic operations do. This is different from the behavior of 
the R8000, R4400 and R10000. In all — cases where flushing the Denorm to 
zero made a difference in the result, the and inexact trap will be taken or the 
Inexact bit will be set. Compatibility with R4400 and R10000 can be achieved 
by setting FS=0. 
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e The behavior of FP to INT conversion instructions will change in that when 
FS=1, an input Denorm will be flushed to zero and the Inexact bit will be set. 
With other inputs, FP to INT conversion will not be affected by the FS mode 
bit. Previously, R10000 took an unimplemented exception whenever a 
conversion from an FP value would result in a value that cannot be represented 
in the target format. This will continue to be the case, with the noted exception 
of Denorm inputs. 


e FP to FP convert instructions will be affected in the same way as arithmetic 
operations. That is, cvt FP to FP will not take exceptions on qNaN or Denorm 
inputs, if and only if FS=1. 


The above changes in R12000 will allow the compilers and applications can do more 
aggressive optimizations during loop unrolling like if-conversion, speculative load 
execution and speculative code motion by making use of this feature. The change is gated 
by the FS bit so that strict IEEE-compliance is possible, as before, by setting the F'S bit to 
Zero. 


B.7 Increase in pre-decode buffering 


Up to 12 instruction may be buffered before being decoded. This should normally be 
invisible to the end user, but can be important when debugging systems in uncached-mode, 
since fetch and decode are now further de-coupled. 


B.8 Increased penalty for indirect branches 


Indirect branches, which were already an expensive operation, have become even more so. 
Instruction fetch now stalls for a minimum of 5 cycles, rather than the 4 for the R10000. 
This additional cycle of delay is seen by both jr and jalr instructions. 


B.9 Addition of a Branch Target Address Cache 
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This 32-entry two-way set-associative cache holds the target addresses of previously-taken 
branches. When a branch is executed a hit in the BTAC eliminates the one-cycle fetch 
bubble with the R10000 experiences for every taken branch. However, if a branch which 
hits in the BTAC is actually predicted not-taken, then a one cycle fetch bubble is introduced 
where none was present before. Performance simulations indicate that the BTAC is a net 
win, but because of its “mixed-blessing” nature, a mechanism has been provided to disable 
it via software. (See description of changes to diag register) 
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B.10 Use of global history in branch-prediction 


The history register is 8 bits wide, and implements the ‘gshare’ predictor (reference to 
paper that defines will be provided later). The history register is updated speculatively, with 
a one cycle delay after a prediction before the results are available for use in forming 
another prediction index. As mentioned earlier, some programs with small “working set of 
conditional branches” benefit significantly from the use of such hashing; however, a slightly 
variable number of previously-executed branches may be omitted from the predictions 
made for any given branch. This will reduce prediction accuracy somewhat. Global history 
register is enabled via bits 26:23 of the Diag Register (CPO register 22). If bit 26 is set, 
branch prediction uses all eight bits of the global history register. If bit 26 is not set, then 
bits 25:23 specify a count of the number of bits of global history register to be used. 


B.11 Increase in branch prediction table size 


The table size is increased to 2048 2-bit entries. 


B.12 Address calculation for load/store instructions uses integer queue 


When load, store, cacheop, or prefetch instructions are decoded, they are sent to both the 
AQ and IQ units. The IQ treats the address-calculate unit as a third “ALU” and issues 
instructions to it. When an instruction completes address calculation, the results are 
forwarded to the AQ. Unlike previously, if an address instruction must be retried for any 
reason, address calculation is not redone. If an the address queue is full, but the integer 
queue has free entries at the time a load/store instruction is decoded, the load/store is sent 
only to the integer queue. When the address queue has an available entry the calculated 
address is forwarded to that entry and the remainder of the load/store execution continues. 


B.13 Load/store dependency is speculatively ignored 


When a load follows a store in program-order, and the address of the load is known to the 
Address Queue (AQ) before the address of the store, then the AQ may speculatively issue 
the load to tag-check and data access. When the address of the store is determined, the AQ 
can undo the effects of the load through the use of the “soft-exception” mechanism. Since 
almost all loads which are actually dependent on previous stores use the same registers to 
form their addresses, normally either the two instructions are independent, or their 

addresses are resolved in program order, so the soft-exception should occur rarely. 
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B.14 DCache set locking relaxed 


In R10000, when an AQ entry accesses a Deache line, that line is locked into the cache until 
the entry graduates, so that the entry will not be removed from the cache until the access 
completes. If another entry which needs to access exactly the same line arrives in the AQ 
before the first completes, the two may share the lock. In this way, a line is locked in the 
cache until all access to it complete. In order to prevent a deadlock from arising, whenever 
a cache line is locked in this way, only the oldest AQ entry can obtain a lock on the other 
“way” of the same cache set, thus ensuring that forward progress can be made. This 
algorithm can cause problems, because often the oldest entry in the AQ is the one which 
already owns the lock on the first way - thus ensuring that no other entries can access the 
second way of the cache for that set index. For some algorithms, most notably FFT’s, this 
can cause severe performance degradation. R12000 allows an entry to obtain the lock on 
the second way of a set if it is the oldest entry which does not already own a lock. Thus, any 
entries which have already acquired a lock, including those locking the first way, will not 
prevent another, younger, entry from accessing that second way. 


B.15 SC refill blocking reduced 


In R10000, during the time that an SCache line is being refilled from system interface via 
the “incoming buffer (IB), no other accesses to the SCache are allowed. If the external 
interface sees an ACK to a line that is being refilled before the last words of the SCache line 
are received by R10000, this means that several cycles can elapse during which SCache 
access is blocked. By breaking the SCache refill transaction into 64-byte blocks, and 
allowing other requests to proceed during breaks between the blocks, this effect could be 
reduced. R12000 pulls in SCache lines with two “pause points.” This first occurs when 
R12000 receives the ACK for a request. If the first two quad-words are already valid in the 
Incoming Buffer at that time, then R12000 will proceed to refill the SCache with those two, 
and forward the results to the DCache or [Cache at the same time as normal. The next two 
quad-words will be refilled as they return, thus continuing to block any other access to the 
SCache just as today. If however, when the initial ACK is received, the first two are not valid 
(i.e., either 0 or 1 quad-words are valid at that time) then R12000 will “pause” the SCache 
refill and wait for both of them to be brought in to the IB. Once the first half is filled in to 
the SCache, R12000 will again check the IB to see if an additional 3 quad-words are valid 
(thus 7 out of the 8 quad-words in the SCache line should have arrived into the IB). Until 
that is the case, R12000 will again “pause” the SCache refill and allow other accesses to 
reach the SCache. These two pauses allow for other requests to slip in during an SCache 
refill. Using only two pauses both simplifies the logic and reduces bus turnarounds. 


B.16 Increased the Way Prediction Table (MRU table) to 16K single-bit entries 


The size of the table has been increased to 16K entries, so that 4MB caches with 128B lines 
or 2MB caches with 64B lines can be fully mapped. 


B.17 Additional cycles for System Interface transactions 
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All transactions which go through the system interface unit (in particular, SCache refills 
and writebacks) have one additional CPU-clock of latency added to them. 
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B.18 FP and Integer-Queue Issue Policy 


The integer and floating-point queues are altered so that they are now composed of two 8- 
entry banks. Instructions are issued into the two banks in an alternating fashion. Each bank 
independently nominates instructions for the functional units. For each FU, the banks 
nominate the oldest instruction they contain which is ready to execute. If both banks 
nominate an instruction for a given FU, a winner is chosen by a priority bit which alternates 
between the two banks on each cycle. 


B.19 Active List entries are increased to 48 


The active list has been enlarged so that it now contains 48 entries. 


B.20 Cache Error inhibits graduation 


When a cache error is detected, all instruction graduation is inhibited on the following 
cycle. Since cache errors are rare, and an exception will occur soon afterwards, this should 
have minimal impact on performance. 


B.21 Changed Spare(1, 3) pins to NC (No Connection) 


The spare(1, 3), shown in the User Manual, Rev 2.0, page 43 tied to Vss through a 100 ohm 
resister, is used in R12000 for diagnostic purpose and thus for R12000 should not be 
connected to anything. 


B.22 CacheOp Index Write Back Invalidate(D) also clears Primary Tag 


As aresult of the CacheOp Index Write Back Invalidate(D) instruction, the Primary Tag is 
also cleared (set to zero) in addition to setting the cache state bits to zeros or (invalid) as 
described in Vp5000, Vp10000 User’s Manual INSTRUCTION. 
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B.23 Summary of the differences 
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Higher operation frequency. 

Core operating voltage for R12000 2.6V. 
Max case temperature for R12000 70°C. 
Less Power consumption. 


Increased options for PClk to SysClk and PClk to SCCIk ratios. 


e Added boot-time mode bits to allow processor upgrade without change in 
system interface and secondary cache interface frequency. 


Added a mode in which the side effects of “Speculative Load/ 
Stores” are avoided. 


¢ Speculative load/stores could cause problems in a system with non- 
coherent I/O. In this mode prevents the behavior that causes the side- 
effects with some trade-off in performance. This mode is optional and can 
be selected during boot-time. 


Added option to disable ““SCData and Tag Corrector”. 


Processor provides data-response even if the external intervention 
hits a Clean Exclusive line 


* (i.e. processor is the owner for both CEx and DEx lines). 


Added an optional Branch Target Address Cache to reduce instruc- 
tion fetch penalty. 


e Since there are trade-offs, this feature can be disabled. 


Added an optional “‘Global History Table” to improve branch pre- 
diction. 


¢ Since not all the program benefit from this feature; so the feature can be 
disabled. 


Added an option to eliminate traps for Denorm/NAN FP inputs 


e This allows the compilers and applications to do more aggressive 
optimization. The change is optional if IEEE compliance is needed. 


Quadrupled the branch prediction table size. 


Doubled the MRU table for SCache way prediction to improve 
SCache hit rate. 
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Improved performance monitoring system. 


e Detailed in a seperate document 
Increased Active list to 48 entries to improve performance. 
Changed the Spare(1,3) pins to NC (No Connection). 


Other miscellaneous changes to improve performance and simplify 
logic. 
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[MEMO] 
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Numerics 


16-word, cache refill 
read sequence 93 


write sequence 98 


32-bit 
address space 305 
mode, TLB entry format 317 


32-word, cache refill 
read sequence 93 


write sequence 98 
3-state, signal 63 


4-word, cache refill 
read sequence 91 


write sequence 96 
599CLGA, see CLGA 
64-bit 

address space 305 

mode, TLB entry format 317 


8-word, cache refill 
read sequence 92 


write sequence 97 


A 


AC electrical specifications 222 
asynchronous inputs 223 
delay time 223 
hold time 223 
maximum operating conditions 222 
setup time 223 
test specification 222 
timing 
secondary cache 222 
System interface 222 


access privileges, address space 314 
ACK completion response 152 
ACK, signal 112 

active list, definition of 359 

add unit, FPU 291 


address 
encodings, mode 305 
Kernel mode 310 
mapping 
Kernel mode 310 
Supervisor mode 308 
User mode 306 
mode 305 
page 316 
queue 26, 33 
instruction graduation 33 
issue ports 33 
number of entries 33 
number of instructions written per cycle 33 
organized as FIFO 33 
sequencing 33 
space 
access privileges 314 
kernel 305 
supervisor 305 
user 305 
virtual 305 
Supervisor mode 308 
translation 318 
User mode 306 


Address Error exception 328 

Address Space Identifier, see also ASID 318 
address/data bus signals 61 

AdEL, indication 328 

AdES, indication 328 
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algorithms size 
cache, five types of 74, 78 primary data cache 68 
aliasing, virtual 89 primary instruction cache 66 


secondary cache 71 
block data transfers 116 


allocate request number requests, external 156 


ALU (arithmetic logic unit) 


No. 1 40 external block data responses 116 

No. 2 40 processor block write requests 116 
ALUI 29, 32 processor coherency data responses 116 
ALU2 29, 32 boundary scan register, JTAG 214 
ANDES, Architecture with Non-sequential Dynamic Execution BPldx, field 268 

Scheduling 24, 364 BPMode, field 267 

arbitration protocol, System interface 130 BPOp, field 268 
arbitration rules, System interface 131 BPState, field 268 
arbitration signals 61 branch 


aebiaton: chistee bus 104 determining next address 39 


, : ; : é ; instruction, limits on execution 39 
Architecture with Non-sequential Dynamic Execution Scheduling, 


see also ANDES 364 prediction 36, 51, 362 
prediction rates, improving 43 


: ; . : AO speculative 362 
arithmetic logic unit, see also ALU unit 30, 39 


array 84 
array, page table entry (PTE) 247 


ASID (Address Space Identifier) 

context switch 318 

relationship to Global (G) bit in TLB entry 318 
ASID (Address Space Indentifier) 

stored in EntryHi register 318 


ASID, field 25 1 


arithmetic instructions, FPU 300 


BRCH, field 267 

BRCV, field 267 
BRCW, field 267 
Breakpoint exception 338 
BSIdx, field 267 


buffer 
cached request 111 
cluster request 111 


asynchronous inputs, AC electrical specification 223 incoming 111, 112 

auto-increment read, cache test mode 355 outgoing 11 ie 113 

auto-increment write, cache test mode 353 uncached 111, 114 
bus 

se sysAD 124 

Bad Virtual Address register (BadV Addr) 250 SysCmd 117 

BadV Addr register 247, 265, 328 SysResp 127 


SysState 126 
Bus Error exception 334 
busy-bit table 360 
bypass register, JTAG 213 


BadVPN2, field 247, 265 

BD, (branch delay) bit 258, 260 
BE, (memory endianness) bit 262 
BEV, (boot exception vector) bit 256 
BEV, bit 195, 320 Cc 
block 


isteaetun ache DO C, (coherency attribute) bit 245 


cache 24 
algorithms 74 


and processor requests 78 


primary data cache 29 


secondary cache 31 


cacheable coherent exclusive on write, description of 75 


cacheable coherent exclusive, description of 75 
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cacheable noncoherent, description of 75 
fields, encoding of 74 
for kseg0 address space 74 
for mapped address space 74 
for xkphys address space 74 
uncached accelerated, description of 76 
uncached, description of 75 
where specified 74 
associativity 65 
block ownership 79 
misses 47 
nonblocking 45, 47 
ordering constraints 37 
pages 316 
primary 24 
primary data 29 
block size 68 
changing states 69 
description of 68 
diagram, state 70 
error handling 199 
index and tag 69 
interleaving 52 
refill 51 
state diagram 70 
states 69 
subset of secondary cache 69 
write back protocol 68 
primary instruction 29 
block size 66 
description of 66 
diagram, state 67 
error handling 198 
error protection 198 
index and tag 66 
refill 51 
state diagram 67 
states 66 
rules, ownership of a cache block 79 
secondary 24 
associativity 31, 71 
block size 71 
block state 89 
blocks 31 
changing states 72 
clock domain 179 
data array 82 
data array width 84 
description of 71 
diagram, state 72 
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Ecc 31 

error handling 200 
index and tag 71 
indexing 84 


indexing the data array 84 
indexing the tag array 85 


interface frequencies 83 


sizes 31 


specifying block size 82 
specifying cache size 82 


state diagram 72 
states 71 


tag 88 


tag and data array ECC 82 


tag array 82 

way prediction 86 

way prediction table 85 

write back protocol 71 
strong ordering 

example of 38 
structure, two-level 65 


Cache Error exception 195, 333 
precision 195 
prioritization 195 


Cache Error handler 195 


CACHE instruction 
support for YO 174 


CACHE instructions 196 
cache miss stalls 47 


cache test mode 
entry 349 
exit 350 


cacheable coherent exclusive on write, cache algorithm 74, 75 


cacheable coherent exclusive, cache algorithm 74, 75 


cacheable noncoherent, cache algorithm 74, 75 


cached request buffer 111 


CacheErr register 195, 196, 198, 199, 280 


capacitors, decoupling 225 
cause bits, FPU 300 


Cause register 127, 128, 250, 258, 260 


Cause, field (FP) 300 

CE, bit 255, 256, 258 
CH, bit 256 

chip revisions, R10000 261 


ckseg0 space 314 
ckseg! space 314 


379 


Appendix C Index 


ckseg3 space 314 conflicts 
cksseg space 314 coherency 165 


CLGA (ceramic land grid array) 228 ater 165 
electrical characteristics 229 TLB, avoiding 318 


layout 228 Context register 247, 265 
mechanical characteristics 228 context switch 318 
package 228 control registers, FPU 298 
pinout 232 controller, TAP 212 


thermal characteristics 230 : 
coordinator, cluster 103 


lock 
oe 4 : COP1 instructions 345 
lomain 
in secondary cache 179 COP? instructions 345 
internal processor clock domain 177 Coprocessor 0, see also CPO 241 
secondary cache clock domain 177 Coprocessor | see also CP1, COP1 257 
System interface clock domain 177 Coprocessor 2 see also CP2, COP2 257 
signal Coprocessor 3 see also CP3, COP3 257 
Ay . 9 Coprocessor Unusable exception 340 
SysClk 177 correctable error 192 
SysCIkRET 178 Count register 128, 250 
signals, overview of 61 CPO (coprocessor 0) 241 
clock divisor, system interface 102, 348 instructions 345 
cluster bus 56, 104 registers, list of 242 


operation 170 esseg space 309 


cluster coordinator 103, 104 CT, bit 262 
CTM, mode bit 188, 349, 350 


CU, (coprocessor usability) field 252, 254, 257 


cluster request buffer 111 
coherency conflicts 165 
coherency protocol, directory-based 175 D 


coherency request, external 160, 162 
D, (dirty) bit 245 


data cache 


coherency schemes 56 


coherency, System interface ; 
; ; , see also cache, primary data 68 
external intervention exclusive request 163 


external intervention shared request 163 data dependencies 42 


external invalidate request 163 data path, secondary cache 31 
CohPreReqTar, mode bit 124, 171, 174, 186 data quality indication 114 
cold reset 181 DBRC, field 267 

sequence 184 DC characteristics of I/O signals 221 
Cold Reset exception 320 DC electrical specifications 218 
Compare register 128, 250 input and output 221 


completing, an instruction 359 input level sensing 220 
maximum operating conditions 219 
mode definitions 220 


power supply levels 218 


completion, definition of 361 
condition bit dependencies 36 
Condition, field (FP) 300 
Config register 262 


Vref, voltage reference 220 
DC power supply levels 218 
DC voltage, reference 220 
DC, (data cache size) field 262 
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DCOk, signal 58, 182, 219, 220, 224 
DE, bit 196, 256 
debugging, and Watch registers 264 
decoding, an instruction 359 
decoupling capacitance 225 
delay times, AC electrical 223 
dependencies 

condition bit 36 

exception 37 

instruction 35 

memory 36 

pipeline 35 

register 36, 363 
DevNum, mode bits 186 
Diagnostic register 267 
directory-based coherency protocol 175 
divide unit, FPU 291 
division by zero, FP 300 
divisor, clock, system interface 102, 348 
DN, (device number) field 262 
Done, bit 32 
done, see also completion 361 
DP, (primary data cache parity) field 279 
DS, (diagnostic status) field 253, 254, 255 
duplicate tags, external 54 
dynamic issue 35, 359 
dynamic scheduling 359 


E 
EC, field 262 


ECC (error correcting code) 
matrix for secondary cache data array 201 
matrix for secondary cache tag array 203 
matrix for System interface 207 


register 279 


secondary cache 31 
ECC register 91, 96 
ECC, field 279 
efficiency, program, suggestions for increasing 43 


electrical specifications 


AC 222 
pc 218 


Enable, field (FP) 300 
enable/output delay 223 
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EntryHi register 251, 317 
ASID field in 318 


EntryLo registers, and FrameMask register 266 
EntryLo0 register 245, 317 

EntryLol register 245, 317 

EPC register 260 

ERL, (error level) bit 195, 255, 304 


ERR completion response 152 
ERR, signal 112 


error 
correctable 192 
handling 191 
protocol 209 
levels, in the Status register 304 
protection 191 
schemes used in R10000 197 
protection schemes, used in R10000 
Ecc 197 
parity 197 
sparse encoding 197 
uncorrectable 193 
handling an 195 
limiting the propagation of 194 
units that detect and report uncorrectable errors 195 
error correcting code see also ECC 197 
Error Exception Program Counter (ErrorEPC) register 290 
Event, field 271 
EW, bit in CacheErr register 196 


ExcCode, field 258, 259 


exception levels, in the Status register 304 


exception processing, CPU 
exception types 

Address Error 328 
Breakpoint 338 
Bus Error 334 
Cache Error 195, 333 
Coprocessor Unusable 340 
Floating-Point 341 
Integer Overflow 335 
Interrupt 343 
NMI 327 
Reserved Instruction 339 
Soft Reset 325 
System Call 337 
TLB 329 
TLB Invalid 329, 331 
TLB Modified 329, 332 
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TLB Refill 329, 330 
Trap 336 

Virtual Coherency 333 
Watch 342 


exception vector location 


Reset 320 
TLB Refill 320 
exception vector selection 321 


precise handling 37 
priority of 321, 323 
TLB refill vector locations 322 


Exception Program Counter (EPC) register 260 
executing, an instruction 359 

execution order 35 

execution pipelines 26 

execution units, iterative 364 

execution, speculative 42, 362 

EXL, (exception level) bit 255, 260, 304, 320 
external ACK completion response 112, 152 
external agent 54, 5 5, 101 


also referred to as cluster coordinator 103 


connecting to 103 
external allocate request number request protocol 156 


external block data response 116, 150 
protocol 149 


external coherency conflicts 166 
external coherency request latency 168 
external coherency requests, action taken 164 


external completion response 153 
protocol 152 


external double/single/partial-word data response protocol 151 
external duplicate tags, support for 174 


external interface 31 
memory accesses 52 
priority operations 52 


external interrupt request 127 
protocol 158 


external intervention exclusive request 163 


external intervention request 155 
protocol 155 


external intervention shared request 163 


external invalidate request 163 
protocol 157 


external NACK completion response 152 


external request 102, 109 
protocol 154 
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external response 102, 109 
protocol 149 


F 


fetch pipeline 26, 39 
fetching, an instruction 359 


FGR (Floating-Point General register) 
32-bit operations 294 
5-bit select 294 
64-bit operations 294 
load operations 295 
operations 294 
Status register FR bit 294 
store operations 295 


Fill, field 251 
flag 
uncorrectable error 112 
Flag, field (FP) 300 
floating-point 
adder 40 
adder pipeline 26 
divide 40, 292 
multiplier 40 
pipeline 27 
queue 26, 33 
instructions written each cycle 33 
number of allowable entries 33 
ports 33 
sequencing 33 
registers 294 
rounding mode 301 
square root 40 


Floating-Point exception 341 

Floating-Point Status register see also FSR 299 
Floating-Point Unit, see also FPU 291 

flow control 115 


external data response 115 
external request 115 
processor coherency data response 115 
processor eliminate request 115 
processor read request 115 
processor upgrade request 115 
processor write request 115 
signals 61 

format, TLB entry 317 


FPU 291 
Active List, control of FSR 299 
add unit 291 


arithmetic instructions 300 
cause bits, FSR 300 
changing rounding mode using a CTC1 301 
compare 300 
condition bits 300 
control registers 298 
divide unit 291 
FGRs (general registers) 294 
FSR, (Status register in FPU) 299 
graduation, control of FSR 299 
latency 291 
logic diagram 292 
move to floating-point 297 
multiply unit 291 
operations 292 
queue 
controlling units 293 
move unit, FPU 292 
read ports 292 
register file 292 
repeat rate 29] 
rounding modes 301 
serial dependency circuit 297 
square-root unit 29 1 


FR, field 254 
FrameMask register 246, 266 
free list 360 


freeing the request number, with completion response 152 


FSR (Floating-Point Status register) 
cause bits 300 
condition bits 300 
division by zero 300 
enable bits 300 
flag bits 300 
inexact result 300 
invalid operation 300 
load exceptions 301 
loading the FSR 301 
overflow 300 
RM, round to minus infinity 301 
RN, round to nearest representable value 301 
RP, round to plus infinity 301 
RZ, round toward zero 301 
underflow 300 
unimplemented operation 300 

functional unit 29 
branch 30 
floating-point adder 29 
floating-point multiplier 29 
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instruction decode and rename 30 
integer ALU 29 

iterative 29 

Load/Store Unit 29 


G 
G, (Global) bit in TLB 246, 318 


gathering data, in identical mode 114 
gathering data, in sequential mode 114 
global processes (G bit in TLB) 318 


graduation 
definition of 361 


of an instruction 359 


Grant parking 130 


H 


hardware emulation, support for 176 
hardware interrupts 127 
hold times, AC electrical 223 


I 


I/O signals, DC characteristics 22 1 

/O, support for 174 

IC, (instruction cache size) field 262 
IE, (interrupt enable) bit 255 

IE, bit 271 

IM, (interrupt mask) field 253 
implementation number, R10000 processor 261 
incoming buffer 111, 112 

Index Load Tag instruction 94 

Index register 243 

Index Store Data CACHE instruction 96 
Index Store Tag CACHE instruction 99 
indexing, the secondary cache 84 
inexact result (FP) 300 

initialization 181 

input voltage levels, maximum 224 


instruction 
CACHE, see also CACHE instructions 196 
completion 42, 359 
COPO see also CPO 345 
cop! 345 
cop2 345 
decoding 359 
dependencies 35 
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DMFC1 300 
execution 359 
fetching 359 
graduation 359 
issue 42, 359 
superscalar 42 
latencies 49 
MEC1 297, 300 
prefetch 47 
queue 32, 39 
repeat rates 49 
serializing 45 
SWC1 297 
SYNC 77, 170 
instruction cache, block size see also cache, primary instruction 66 
instruction register, JTAG 213 
integer 
queue 32 
branch instructions 32 
divide instructions 32 


multiply instructions 32 
ports 32 


shift instructions 32 

integer ALU pipeline 26 
Integer Overflow exception 335 
integer queue 26 
interface, external 31 
internal coherency conflicts 165 
internal processor clock domain 178 
Interrupt exception 343 
interrupt mask, bit 250 
Interrupt register 127 
interrupt request, external 127 
interrupts 127 

hardware 127 

nonmaskable 128 

software 128 

timer 128 
invalid operation, FP 300 
invalidate request, external 157 
IP, (interrupt pending) bit 258, 279 
ISA (Instruction Set Architecture) 

MIPS I 22 

MIPS II 22 

MIPS Ill 22 

MIPS Iv 22, 294 


issue, dynamic 359 
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issuing, an instruction 359 
iterative execution units 364 
ITLB (instruction TLB) 318 
ITLBM, field 267 


J 


JTAG 
boundary scan register 214 
bypass register 213 
Capture-DR state 214 
instruction register 213 
interface 211 
instruction register 213 
JTCK signal 212 
JTDI signal 212 
JTDO signal 212 
JTMS signal 212 
Tap controller 212 
test access port 212 
Shift-DR state 213, 214 
signals 63 
Update-DR state 214 
Update-IR state 213 
JTCK, signal 63, 64, 212 
JTDI, signal 63, 64, 212, 213 
JTDO, signal 63, 212, 213 
JTLB (joint TLB) 318 
JTMS, signal 63, 64, 212 


K 


KO, field 262 

Kernel mode 304 
address mapping 310 
ckseg0 space 314 
cksegl space 314 
ckseg3 space 314 
cksseg space 314 
kseg0 space 311 
kseg space 311 
kseg3 space 311 
ksseg space 311 
kuseg space 311 
operations 310 
xkphys space 312 
xkseg space 314 
xksseg space 312 
xkuseg space 312 


kseg0 space 311 


Kseg0CA, mode bits 186 
kseg] space 311 

kseg3 space 311 

ksseg space 311 

KSU, field 253, 255, 320 
kuseg space 311 


KX, bit 254, 304 


L 
latency 49 


accessing secondary cache 51 
definition of 358 


external coherency request 168 


FPU 291 


least-recently used replacement algorithm (LRU) 29 
level sensing, input 220 

list, free 360 

LLAddr register 263 

load operations, FPU registers 295 

Load/Store Unit pipeline 26 


loads 
nonblocking 361 
logic diagram, FPU 292 
logical register 
initialization (necessity for) 182 
logical register, see also physical register 363 
LRU (least-recently used) replacement algorithm 29 


M 


mapped, virtual address region 305 
mapping table 363 
Mask, field 248 


master state 103 
and flow control 115 


matches, multiple, in TLB 318 

MemEnd, mode bits 187 

memory dependencies 36 

memory ordering 37 

memory protection 316 

MIPS III ISA, disabled and enabled 246 
MIPS IV, instruction set see also ISA 344 
miscellaneous system signals 62 


mispredicted branch 51 
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mode 


addressing 305 
addressing, encodings 305 
Kernel mode 305 

Supervisor mode 305 
User mode 305 
operating 304 


mode bits 186 


CohPrcReqTar 124, 171, 174, 186 
CT™ 189, 349, 350 

DevNum 186 

Kseg0CA 186 

MemEnd 187 

ODrainSys 189, 220 

PrcEImReq 145, 175, 186 
PrcReqMax 115, 135, 137, 143, 147, 186 
SCBIkSize 71, 82, 114, 187 
SCCIkDiv 83, 178, 182, 187 
SCCIkTap 179, 188 

SCCorEn 187, 201, 203 

SCSize 71, 82, 187 

SysClkDiv 102, 178, 182, 187 


mode definitions, DC 220 
MP, field 267 

MTCO, instruction 91 
multiple matches, in TLB 318 


multiplier pipeline 26 
multiply unit, FPU 291 


multiprocessor system 55 


multiprocessor system, using dedicated external agents 106 


arbitration 133 
cluster bus 55 
with external agent 55 


multiprocessor system, using the cluster bus 107 


N 


NACK completion response 152 
NACK, signal 112 

NMI see also nonmaskable interrupt 290 
NML, bit 255, 256 


nonblocking cache 47 


nonblocking, loads and stores 361 
Nonmaskable Interrupt (NMI) exception 128, 320, 327 


normal read, cache test mode 354 


normal write, cache test mode 352 


NT compatibility, LLAddr register 263 
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number, request 109 


O 


ODrainSys, mode bit 188, 220 
offset, in page address 316 
operating conditions, AC 222 


operating mode 


Kernel 304, 310 
Supervisor 304, 308 
User 304, 306 


operations, FPU 292 

ordering, memory 37 

ordering, strong 37 

out of program order, execution 358 
outgoing buffer 111, 113, 114 
outstanding requests 109 

overflow (FP) 300 


P 


package configuration 227 
package, see CLGA 
PAddr0, field 264 
PAddrl, field 264 
page 
address 316 
offset 316 
size 
code 316 
defined 316 
virtual 316 


page table entry (PTE) array 247 
PageMask register 248, 316, 317 
parity protection 197 

PClk, signal 83, 102, 354, 355 
PE, bit 262 


performance 
branch prediction 51 
cache 5] 
R10000 48, 51 


Performance Counter interrupt 250 
Performance Counter register 270 
permanent register 360 


PFN 
bits 246 
fields, in EntryLo registers 246 
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phase-locked loop 180 
physical memory addresses 316 
physical page frame number 245 
physical register, see also logical register 363 
Pldx, primary cache index 89 
pipeline 39 
definition of 358 
fetch 26, 39 


floating-point 27 
floating-point multiplier 26 
integer ALU 26 

latency 358 

Load/Store Unit 26 

out of order execution 358 
repeat rate 358 

sequence 358 

stage (definition) 358 
stage 1 39, 40 

stage 2 39 

stages 4-6 40 

stalls 35 


PLL 180 

PLLDis, signal 63, 64 

PLLRC, capacitor 229 

PM, field 262 

power interface signals, see also individual signals 58 


power supply 
levels, DC 218 
regulation 224 


power-on reset 181 


sequence 182 
PrcElmReq, mode bit 145, 175, 186 
PrcReqMax, mode bits 1 15, 135, 137, 143, 147, 186 
precise exceptions 37 
prediction, branch 362 
prediction, secondary cache, way 85 
prefetch instruction 47 
primary data cache, see also cache, primary data 29 
primary instruction cache, see also cache, primary instruction 29 
processor block read request protocol 135 


processor block write request 116 
protocol 139 


processor coherency data response 116 
protocol 161 


processor coherency state response protocol 160 


processor double/single/partial-word read request protocol 137 


processor double/single/partial-word write request protocol 141 


processor eliminate request protocol 145 
processor request 102, 108 

flow control protocol 147 

protocol 134 
processor response 102, 109 

protocols 159 


Processor Revision Identifier (PRId) register 261 


processor upgrade request 153 
protocol 143 
program order 35 
dynamic execution 35 
instruction completion 359 
instruction decoding 359 
instruction execution 359 
instruction fetching 359 
instruction graduation 359 
instruction issue 359 
protection 
Ecc 197 
memory 316 
parity 197 
SECDED 197 
sparse encoding 197 
protocol 
arbitration, System interface 130 
error handling 209 
write back 65 
write invalidate cache coherency 65 


PTE (page table entry) 247 
PTEBase, field 247, 265 


Q 


queue 
address 26 
instruction 39 
integer 26 
R 
R, (region) field 251, 265 
R, bit 264 
R10000 processor 
ANDES architecture 24 
caches 24 


execution pipelines 26 
overview 24 


pipeline stages 25 
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superscalar pipeline 25 
R4000 superpipeline 23 
Random entries 249 
Random register 244 
RE, (reverse endian) bit 253 
read port, FPU 292 


read sequences 90 
16-word 93 
32-word 93 
4-word 91 
8-word 92 
tag 94 

reference voltage 224 


Dc 220 
register 
BadVAddr 247, 250, 265, 328 
boundary scan, JTAG 214 
bypass, JTAG 213 
CacheErr 195, 196, 198, 199, 280 
Cause 127, 128, 250, 258, 260 
Compare 128, 250 
Config 262 
Context 247, 265 
Count 128, 250 
CPO (description of) 241 
dependency 36, 363 
Diagnostic 267 
ECC 91, 96, 279 
EntryHi 251 
EntryLo0 245 
EntryLol 245 
EPC 260 
Error Exception Program Counter (ErrorEPC) 290 
Exception Program Counter (EPC) 260 
file 
FPU 292 
ports 363 
FrameMask 246, 266 
Index 243 
instruction, JTAG 213 
LLAddr 263 
logical, see also physical register 39, 363 
PageMask 248, 316 
Performance Counter 270 
permanent 360 
physical, see also logical register 39, 363 
Processor Revision Identifier (PRId) 261 
Random 244 
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renaming 36, 360 
Status 195, 196 
ERL bit 304 
EXL bit 304 
Sx bit 314 
TS bit 318 
USL field 304 
UX bit 314 
TagHi 91, 96, 284 
TagLo 91, 96, 284 


temporary 360 
unnamed 361 
WatchHi 264 
WatchLo 264 
Wired 244, 249 


write before reading (necessity for) 182 
XContext 265 


renaming, register 360 
repeat rate 49 


accessing secondary cache 51 
definition of 358 
FPU 291 


replacement algorithm, cache 29 
request cycle 102 
request number 109 


freeing with completion response 152 
request, outstanding 109 
Reserved Instruction exception 339 


reset 


cold 181, 184 
power-on 181, 182 
soft (warm) 181, 185 


response bus signals 62 

response cycle 102 

revision number, R10000 processor 261 
RM, field (FP) 301 

RN, field (FP) 301 

rounding modes, in FSR 301 

RP, (reduced power) bit 253 

RP, field (FP) 301 

rules, arbitration for System interface 131 


RZ, field (FP) 301 


S 


SB, (secondary cache block size) bit 262 
SC(A,B)Addr, signals 59, 84, 85 
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SC(A,B)DWay, signals 59, 84, 92, 97 
SC, bit 262 
SCADCS, signal 59 
SCADOE, signal 59 
SCADWr, signal 59 
SCBDCS, signal 59 
SCBDOE, signal 59 
SCBDWr, signal 59 
SCBIkSize, mode bits 71, 82, 114, 187 
SCClk frequency 140, 161 
SCClk, signal 59, 83, 179 
SCCIkDiv, mode bits 83, 178, 182, 187 
SCCIkTap, mode bits 179, 188 
SCCorEn, mode bits 187, 201, 203 
SCData, signal 59 
SCDataChk, bus 200, 203 
SCDataChk, signal 59 
scheduling, dynamic 359 
SCSize, mode bits 71, 82, 187 
SCTag, signals 60, 88 
SCTagChk, bus 203 
SCTagChk, signal 60 
SCTagLSBAddr, signal 59, 85 
SCTCS, signal 60 
SCTOE, signal 60 
SCTWay, signal 60, 85, 87, 92 
SCTWr, signal 60 
SECDED 197 
secondary cache interface signals, see also individual signals 59 
secondary cache, see also cache, secondary 71 
SelDVCO, signal 63, 64 
serial operations 45 
serializing instruction 45 
setup times, AC electrical 223 
signal integrity 224 
decoupling capacitance 225 
maximum input voltage levels 224 


power supply regulation 224 
reference voltage 224 

signals 
power interface, see also individual signals 58 
secondary cache interface, see also individual signals 59 
System interface, see also individual signals 61 


test interface, see also individual signals 63 
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size, page in memory 316 superscalar processor 35 

SK, bit 262 Supervisor mode 304 

slave state 103 address mapping 308 
and flow control 115 csseg space 309 

soft (warm) reset 181, 185 operations 308 


Soft Reset sseg space 309 


exception 325 suseg space 308 


xsseg space 309 


Soft Reset exception 320 xsuseg space 309 


software interrupts 128 


SP, bit 279 switch, context 318 


sparse encoding protection 197 SX. bit 254. 304. 314 
9 ’ ? 
special interrupt vector 324 


suseg space 308 


SYNC 
specifications, test, AC electrical 222 instruction 77, 170 
speculative branching 362 prevented from graduating 114 
speculative execution 36, 43, 362 SysAD, bus signals 61, 117, 122, 124, 205, 206, 348, 
squars-toor uni FPO 291 350, 351, 352, 353, 354, 355 
SR, bit 256, 325, 327 eysaDIiz0: 15) 


interrupt register 127 
SysAD[39:0] 
during address cycle 125 


SS, (secondary cache size) field 262 
sseg space 309 


SSRAM 81, 86 
address signals 59 Sea Pea ie 
clock signals 59 during address cycle 


data signals 59 SysAD[57] 


tag signals 60 secondary cache block way indication 125 


stage, definition of 358 SysAD[59:58] 
uncached attribute 124 


SysAD[63:0] 
address cycle encoding 124 


stalls, improving performance 35 
standard package configuration 227 


state 
data cycle encoding 126 
master 103 ‘ . 


slave 103 SysAD[63:60] 
address cycle 124 
interrupt 127 


SysADChk, bus 206 
SysADChk, signal 62, 186 
SysClk cycle 115, 149, 170 


sysClk, signal 61, 102, 126, 128, 130, 131, 135, 143, 
and uncached buffer 76 147, 176, 177, 222, 233, 354, 355 


ene ae SysClkDiv, mode bits 178, 182, 187 
SysClkRet, signal 61, 178, 180 
SysCmd, bus 61, 117, 194, 205, 206 
SysCmd[0] 112 

ECE 122 


processor data cycles 122 
SysCmd[10:8] 117 
processor 


definition of 23, 358 data response 121 
external intervention and invalidate requests 120 


state bus signals 62 


Status register 195 
in FPU, see also FSR 294 


store operations, FPU registers 295 


stores 


strong ordering 37 
example of 38 


superpipeline, architecture 23 
superpipeline, R4000 23 
superscalar 


pipeline 23 
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SysCmd[11:0] 
map 123 
protocol 129 
SysCmdf11] 117 
SysCmd[2:0] 
processor write requests 120 
SysCmd[2:1] 
block data response 122 
processor requests 119 
SysCmd[4:3] 
data cycles 122 
external special requests 121 
processor read requests 118 
processor upgrade requests 119 
SysCmd[5] 
data cycles 121 
SysCmd[5], bit 112 
SysCmd[7:5] 
external requests 120 


processor requests 118 
SysCmdPar, signal 61, 205 
SysCorErr, signal 62, 192, 201, 203, 206 
SysCyc, signal 62, 176 
SysGblPerf, signal 62, 77, 170 


SysGnt, signal 61, 130, 131, 132, 134, 136, 138, 140, 
142, 144, 146, 149, 154, 155, 156, 157, 158, 
161, 170, 182, 184, 185, 324, 325, 349, 350 


SysNML, signal 62, 128, 327 
SysRdRdy, signal 61, 131, 135, 137, 143, 147 
and flow control 115 


SysRel, signal 61, 130, 132, 134, 136, 138, 140, 142, 
144, 146, 149, 154, 155, 156, 157, 158, 161, 
170 


SysReq, signal 61, 130, 131, 134, 136, 138, 140, 142, 
144, 146, 161, 170, 184 


SysReset, signal 62, 182, 184, 185, 212, 223, 324, 325, 
326, 349, 350 


SysResp, bus 62, 117, 127, 208 
SysResp[4:0] 
external completion response 152 
SysResp[4:2] 
driving completion indication 127 
SysRespPar, signal 62, 208 
SysRespVal, signal 62, 152, 182, 184, 185, 208 
SysState, bus 62, 117, 126, 194, 208 
SysState[0] 


processor coherency data response 168 
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SysState[2:0] 

encoding 126 
SysStatePar, signal 62, 208 
SysStateVal, signal 62, 126 
System Call exception 337 


system configuration 
multiprocessor 55 
uniprocessor 54 
System interface 31, 101 
arbitration 
in a cluster bus system 104, 133 
in a uniprocessor system 132 
protocol 130 
rules 131 
block write request protocol 139 
buffers 111 
bus encoding 
description of buses 117 
SysAD 124 
SysCmd 117 
SysResp 127 
SysState 126 
cached request buffer 111 
clock domain 178 
cluster bus 104 
cluster request buffer 111 
coherency 163 
coherency conflicts, action taken 165 
connecting to an external agent 103 
connections to various system configurations 105 
directory-based coherency protocol 175 
error handling 
on buses 205 
on SysAD bus 206 
on SysCmd bus 205 
on SysResp bus 208 
on SysState bus 208 
schemes 204 
error protection 
for buses 204 
schemes 204 
external agent 101 
external allocate request number request protocol 156 
external block data response protocol 149 
external coherency requests, action taken 164 
external completion response protocol 152 
external data response flow control 115, 116 


external double/single/partial-word data response protocol 
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external duplicate tags, support for 174 
external interrupt request protocol 158 
external intervention exclusive request 163 
external intervention request protocol 155 
external intervention shared request 163 
external invalidate request 163 

protocol 157 
external request 102, 109 

flow control 115 

protocol 154 
external response 102, 109 

protocol 149 
flow control 115 
frequencies 102 
grant parking 130 
hardware emulation, support for 176 
vo 174 
incoming buffer 112 
internal coherency conflicts 165 
interrupts 127 
master state 103 
multiprocessor connections 

with cluster bus 107 

with dedicated external agents 106 
outgoing buffer 113 
outstanding processor requests 109 
outstanding requests on the System interface 109 
port 24 
processor block read request protocol 135 
processor coherency data response protocol 161 
processor coherency state response protocol 160 


processor double/single/partial-word read request protocol 


137 


processor double/single/partial-word write request protocol 


4] 


processor eliminate request protocol 145 
processor request 102, 108 
flow control protocol 147 
protocol 134 
processor response 102, 109 
protocols 159 
processor upgrade request protocol 143 


register-to-register operation 102 


request 108 
cycle 102 
number field 109 
protocol 134 
response 108 
cycle 102 
protocol 134 


signals 61, 103 

slave state 103 

split transaction 109 
support for YO 174 
uncached attribute 175 
uncached buffer 114 


uniprocessor connections 105 


SysUncErr, signal 62, 193, 194, 198, 199, 203 


SysVal, signal 62, 135, 137, 139, 141, 143, 145, 149, 
151, 155, 156, 157, 158, 161, 205, 348, 352, 
353, 354, 355 


SysWrkRdy, signal 61, 140, 141, 145, 147, 161 


and flow control 115 


T 


table 
busy-bit 360 
mapping 363 
tag bus, secondary cache, SCTag 88 
tag read sequence 94 
tag write sequence 99 
TagHi register 91, 96, 284 
TagLo register 91, 96, 284 
tags, external, duplicate 174 
TAP controller 212, 213 
TCA, signal 63, 64 
TCB, signal 63, 64 
temporary register 360 
test access port (TAP) 212 
test interface signals, see also individual signals 63 
test mode, cache 349, 350 
test signals, miscellaneous 63 


Timer interrupt 128 
disabling 250 


TLB 317 
32-bit-mode entry format 317 
64-bit-mode entry format 317 
address 
translation, avoiding multiple matches 318 
ASID field 318 
avoiding conflict 318 
Cache Algorithm fields 317 
entry formats 317 
exceptions 329 
Global (G) bit 318 
ITLB 318 
misses 247 
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multiple matches, avoiding 318 
number of entries 317 

page size code 316 

used with Context register 247 


TLB (Translation Lookaside Buffer) 27 
JTLB 318 


TLB Invalid exception 329, 331 

TLB Modified exception 329, 332 

TLB Probe (TLBP) instruction 243, 251 
TLB Read (TLBR) instruction 243 

TLB Read Indexed (TLBR) instruction 251 
TLB Refill 321 

TLB Refill exception 329, 330 


TLB Write Indexed (TLBWI) instruction 243, 251 


TLB Write Random instruction 244, 251 


Translation Look-Aside Buffer, see also TLB 317 


translation, virtual address 316, 318 

Trap exception 336 

trap physical address, and Watch registers 264 
TriState, signal 214 

TS, (TLB shutdown) bit 255, 256 

TS, bit, in Status register 318 


two-level cache structure 65 


U 
UC, (uncached attribute) bit 245 


uncached 

accelerated 
blocks, completely gathered 76 
blocks, incompletely gathered 76 
stores 76 

attribute, support for 175 

buffer 111, 114 

cache algorithm 74, 75 


uncached accelerated 246 
uncached accelerated, cache algorithm 74, 76 
uncached attribute 246 


uncorrectable error 193 


detection, suppressed 196 
flag 112, 114 


underflow (FP) 300 
unimplemented operation (FP) 300 


uniprocessor system 54, 105 
arbitration rules 132 


unnaming, register 361 
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useg space 306, 307 


User mode 304 
address mapping 306 
operations 306 
useg space 307 
xuseg space 307 


UX, bit 254, 304, 314 


Vv 


V, (valid) bit 245 

Vee, signal 58, 218, 229 
VecPa, signal 58 

VecPd, signal 58 

VccQ, signal 218, 219, 221 
VecQSC, signal 58, 218, 229 
VecQSys, signal 58, 218, 229 
vector locations, TLB refill 322 
vector, special interrupt 324 


virtual address 
space 305 


translation 316 
virtual aliasing 89 
Virtual Coherency exception 333 
virtual memory addresses 316 


voltage 
input, maximum 224 
reference 224 


VPN2, field 25.1 

Vref, signal 224 
VrefByp, signal 58 
VrefSC, signal 58, 220 
VrefSys, signal 58, 220 
Vss, signal 58, 229 
VssPa, signal 58 

VssPd, signal 58 


Ww 

w, bit 264 

Watch exception 342 
WatchHi register 264 
WatchLo register 264 


way prediction table, secondary cache 86 


Wired entries 249 
Wired register 244, 249 
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write back protocol 65 
primary data cache 68 


write sequences 95 
16-word 98 
32-word 98 
4-word 96 
8-word 97 
tag 99 


xX 


XContext register 265 

xkphys 
decoding virtual address bits VA(61:59) 318 
space 312 


xkseg space 314 

xksseg space 312 

xkuseg space 312 

xsseg space 309 

xsuseg space 309 

XTLB Refill 321 

XTLB refill handler, used with XContext register 265 
xuseg space 306, 307 

XX, (MIPS IV User mode) bit 252, 254, 304, 344 
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